Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretraining runtimes from the paper #46

Open
sgalkina opened this issue Jan 18, 2024 · 1 comment
Open

Pretraining runtimes from the paper #46

sgalkina opened this issue Jan 18, 2024 · 1 comment

Comments

@sgalkina
Copy link

Hi! Great work, and also great youtube presentation, thanks for making that public.

I have a question about the runtimes. In the Table A.2 it says that pre-training took 80min for the model with 1.6M parameters. When I pretrain on my dataset the model with 3.3M parameters (input size 16k, 3 Hyena layers, emb.dim. 256) it takes me around 16 hours for the dataset with only 21000 samples. Anything is wrong with my setup? Could you please specify more explicitly, what data size went into the Table A.2? Like, how many samples of which sequence length with what batch size.
And if it's possible to tell, what share of nucleotides of human genome the pretrained model (like, the one with the batch size 32k) ended up seeing?

Thank you for the nice work!

@exnx
Copy link
Collaborator

exnx commented Feb 12, 2024

We only report in Table A.2 pretraining time for the tiny 2 layer, d_model=256, seq_len=1k model for the Nucleotide Transformer datasets. In general, the bigger the model, or the longer the sequence, the longer the training time. In section A.1 of appendix, you'll see that the 1M context length model (our biggest model) was trained for 4 weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants