Pretraining runtimes from the paper #46

sgalkina · 2024-01-18T08:02:23Z

Hi! Great work, and also great youtube presentation, thanks for making that public.

I have a question about the runtimes. In the Table A.2 it says that pre-training took 80min for the model with 1.6M parameters. When I pretrain on my dataset the model with 3.3M parameters (input size 16k, 3 Hyena layers, emb.dim. 256) it takes me around 16 hours for the dataset with only 21000 samples. Anything is wrong with my setup? Could you please specify more explicitly, what data size went into the Table A.2? Like, how many samples of which sequence length with what batch size.
And if it's possible to tell, what share of nucleotides of human genome the pretrained model (like, the one with the batch size 32k) ended up seeing?

Thank you for the nice work!

exnx · 2024-02-12T18:10:07Z

We only report in Table A.2 pretraining time for the tiny 2 layer, d_model=256, seq_len=1k model for the Nucleotide Transformer datasets. In general, the bigger the model, or the longer the sequence, the longer the training time. In section A.1 of appendix, you'll see that the 1M context length model (our biggest model) was trained for 4 weeks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretraining runtimes from the paper #46

Pretraining runtimes from the paper #46

sgalkina commented Jan 18, 2024

exnx commented Feb 12, 2024 •

edited

Pretraining runtimes from the paper #46

Pretraining runtimes from the paper #46

Comments

sgalkina commented Jan 18, 2024

exnx commented Feb 12, 2024 • edited

exnx commented Feb 12, 2024 •

edited