Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while training RASR #968

Open
innarid opened this issue Apr 20, 2021 · 3 comments
Open

Error while training RASR #968

innarid opened this issue Apr 20, 2021 · 3 comments
Labels

Comments

@innarid
Copy link

innarid commented Apr 20, 2021

I got error(Floating point exception) while training 10 epoch.
error.txt
Log of 9 epoch looks good
009_log.txt
I tried to change lr and lr decay, but it didn't help.
Could you please help me to find the reason of this error? Thanks!

@innarid innarid added the bug label Apr 20, 2021
@tlikhomanenko
Copy link
Contributor

Could you run without distributed training? Do you have all 1-9 epochs passed fine?

@innarid
Copy link
Author

innarid commented Apr 23, 2021

I got the same error with -enable_distributed=false. Yes, 9 epochs passed fine.

@tlikhomanenko
Copy link
Contributor

Can you confirm that if you rerun training from epoch 1 it is still working?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants