Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SFT does not work max_steps #159

Open
AtsunoriFujita opened this issue Apr 18, 2024 · 2 comments
Open

SFT does not work max_steps #159

AtsunoriFujita opened this issue Apr 18, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@AtsunoriFujita
Copy link

AtsunoriFujita commented Apr 18, 2024

SFT API ignores trainer.sft.max_steps in gpt_sft.yaml.
Always refer to trainer.sft.max_epochs only.

Test case:

  • trainer.sft.max_steps=200 and trainer.sft.max_epochs=-1
     - 0 step job finished
  • trainer.sft.max_steps=200 and trainer.sft.max_epochs=0
     - 0 step job finished
  • trainer.sft.max_steps=200 and trainer.sft.max_epochs=1 (187 steps)
     - 187 steps job finished
  • ~trainer.sft.max_epochs
     - error
@AtsunoriFujita AtsunoriFujita added the bug Something isn't working label Apr 18, 2024
@odelalleau
Copy link
Collaborator

Thanks for reporting this. I think the main issue is that we don't support undefined max_epochs (-1 or None -- btw I would suggest to treat both the same, contrary to PTL, because I don't think there's much value in using a default number of 1000 epochs when using None as in PTL). This definitely needs to be fixed.

In the meantime, you can probably work around this by simply setting max_epochs to a large value, so that only max_steps is taken into account (note that the only bugged run in your examples is the first one -- the other three work as intended).

@AtsunoriFujita
Copy link
Author

Sorry, my testing was insufficient. I thought this issue was a similar situation to NeMo (does not work max_epochs consistently).
Certainly, that workaround worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants