Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nan loss #9

Open
LUOBO123LUOBO123 opened this issue Nov 23, 2022 · 1 comment
Open

Nan loss #9

LUOBO123LUOBO123 opened this issue Nov 23, 2022 · 1 comment

Comments

@LUOBO123LUOBO123
Copy link

I change the input resolution to 416*416 when I train custom datasets. When the network is trained for 49 epochs, the print loss is nan.What could be the reason for this?

@SelfSup-MIM
Copy link
Collaborator

Hi, there are several tips that may help alleviate the issue.

  1. Decrease learning rate.
  2. Enlarge the drop path ratio.
  3. Decrease clip grad number.
  4. Enlarge warm up epochs.
  5. Use FP32 for Attention block and layernorm, instead of FP16.
  6. Adapt the decay number of AdamW optimizer. The default is betas=(0.9, 0.999) while MAE uses betas=(0.9, 0.95). Maybe it helps improve the stability of training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants