Nan loss #9

LUOBO123LUOBO123 · 2022-11-23T01:28:02Z

I change the input resolution to 416*416 when I train custom datasets. When the network is trained for 49 epochs, the print loss is nan.What could be the reason for this?

SelfSup-MIM · 2022-12-07T07:14:54Z

Hi, there are several tips that may help alleviate the issue.

Decrease learning rate.
Enlarge the drop path ratio.
Decrease clip grad number.
Enlarge warm up epochs.
Use FP32 for Attention block and layernorm, instead of FP16.
Adapt the decay number of AdamW optimizer. The default is betas=(0.9, 0.999) while MAE uses betas=(0.9, 0.95). Maybe it helps improve the stability of training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nan loss #9

Nan loss #9

LUOBO123LUOBO123 commented Nov 23, 2022

SelfSup-MIM commented Dec 7, 2022

Nan loss #9

Nan loss #9

Comments

LUOBO123LUOBO123 commented Nov 23, 2022

SelfSup-MIM commented Dec 7, 2022