Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在训练BERT时, Loss突然增大且模型无法继续学习 #335

Open
xlxwalex opened this issue Aug 5, 2022 · 3 comments
Open

在训练BERT时, Loss突然增大且模型无法继续学习 #335

xlxwalex opened this issue Aug 5, 2022 · 3 comments

Comments

@xlxwalex
Copy link

xlxwalex commented Aug 5, 2022

我在训练BERT时使用了BookCorpus+Wikipedia-en数据,训练参数设置了batch_size=5120,warmup=0.1,learning_rate=4e-4,使用deep_init,没有用混合精度,steps(计算了40个epochs)=240k。但是在127k步左右突然Loss增大性能下降,且之后模型停止学习。请问这个可能是什么原因导致? (Log如下所示)

B0XI4S(0F6A~WNMT{EC8BJF
之后模型就一直无法学习了
33~Q)}R R54A)3U1V9BOC5I

@hhou435
Copy link
Collaborator

hhou435 commented Aug 5, 2022

模型用的哪个配置呢?可能是lr太大

@xlxwalex
Copy link
Author

xlxwalex commented Aug 5, 2022

用的是Base的,因为batch_size比较大所以稍微放大了一点

@xlxwalex
Copy link
Author

xlxwalex commented Aug 5, 2022

模型用的哪个配置呢?可能是lr太大

我尝试一下用更小的lr再试一下,谢谢您的回复!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants