Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force rampup_batch_size=None in config #83

Open
shengyangs opened this issue Jan 18, 2024 · 3 comments
Open

Force rampup_batch_size=None in config #83

shengyangs opened this issue Jan 18, 2024 · 3 comments

Comments

@shengyangs
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

When the model config has rampup_batch_size, we will have model loading errors when the global_batch_size is not set accordingly. Since rampup_batch_size is only used in pretraining, not in alignment. We should force it to be None when loading the model.

Describe the solution you'd like

gpt_cfg.rampup_batch_size=None

@odelalleau
Copy link
Collaborator

I would do it at the config level rather than in the code, so that it's still possible to set it manually through the config if someone ever needs it.

@shengyangs
Copy link
Collaborator Author

Doing it at the config level would be a better option if possible.

The proposal is something like

  1. When the user wants to use the rampup_batch_size in the model, model.rampup_batch_size="model".
  2. When the user wants to turn off rampup_batch_size, model.rampup_batch_size="off" or null
  3. When the user wants to specify it, model.rampup_batch_size=[xx, xx, xx]

This is not too satisfying because it might cause some confusions.

@odelalleau
Copy link
Collaborator

odelalleau commented Jan 18, 2024

I would just set rampup_batch_size: null in our .yaml and not worry about use case 1 (I see no good reason why someone would want to use the same rampup batch size that was used for pre-training, and if they want to they can just manually copy it with use case 3 -- or use a different config file that doesn't overwrite it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants