New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loss.backward() producing nan values with 8-bit Llama-3-70B-Instruct #30526
Comments
Hey! I think this is related to the model itself which did not really trained these tokens properly? Saw a few threads saying the init of these tokens should be updated. @younesbelkada if you can have a look if it's rather quantization related! |
hi @haroldtimmers ! |
System Info
transformers
version: 4.40.1Who can help?
@ArthurZucker @younesbelkada
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
This produces the following error message and stack trace:
Expected behavior
No error thrown.
The text was updated successfully, but these errors were encountered: