-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pretraining runtimes from the paper #46
Comments
We only report in Table A.2 pretraining time for the tiny 2 layer, d_model=256, seq_len=1k model for the Nucleotide Transformer datasets. In general, the bigger the model, or the longer the sequence, the longer the training time. In section A.1 of appendix, you'll see that the 1M context length model (our biggest model) was trained for 4 weeks. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi! Great work, and also great youtube presentation, thanks for making that public.
I have a question about the runtimes. In the Table A.2 it says that pre-training took 80min for the model with 1.6M parameters. When I pretrain on my dataset the model with 3.3M parameters (input size 16k, 3 Hyena layers, emb.dim. 256) it takes me around 16 hours for the dataset with only 21000 samples. Anything is wrong with my setup? Could you please specify more explicitly, what data size went into the Table A.2? Like, how many samples of which sequence length with what batch size.
And if it's possible to tell, what share of nucleotides of human genome the pretrained model (like, the one with the batch size 32k) ended up seeing?
Thank you for the nice work!
The text was updated successfully, but these errors were encountered: