在训练BERT时， Loss突然增大且模型无法继续学习 #335

xlxwalex · 2022-08-05T05:25:14Z

我在训练BERT时使用了BookCorpus+Wikipedia-en数据，训练参数设置了batch_size=5120，warmup=0.1，learning_rate=4e-4，使用deep_init，没有用混合精度，steps(计算了40个epochs)=240k。但是在127k步左右突然Loss增大性能下降，且之后模型停止学习。请问这个可能是什么原因导致？ (Log如下所示)

之后模型就一直无法学习了

hhou435 · 2022-08-05T05:28:17Z

模型用的哪个配置呢？可能是lr太大

xlxwalex · 2022-08-05T05:29:08Z

用的是Base的，因为batch_size比较大所以稍微放大了一点

xlxwalex · 2022-08-05T05:30:26Z

模型用的哪个配置呢？可能是lr太大

我尝试一下用更小的lr再试一下，谢谢您的回复！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

在训练BERT时， Loss突然增大且模型无法继续学习 #335

在训练BERT时， Loss突然增大且模型无法继续学习 #335

xlxwalex commented Aug 5, 2022 •

edited

hhou435 commented Aug 5, 2022

xlxwalex commented Aug 5, 2022

xlxwalex commented Aug 5, 2022

在训练BERT时， Loss突然增大且模型无法继续学习 #335

在训练BERT时， Loss突然增大且模型无法继续学习 #335

Comments

xlxwalex commented Aug 5, 2022 • edited

hhou435 commented Aug 5, 2022

xlxwalex commented Aug 5, 2022

xlxwalex commented Aug 5, 2022

xlxwalex commented Aug 5, 2022 •

edited