Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do I encounter a sudden MLM accuracy drop during training? #350

Open
dr-GitHub-account opened this issue Dec 10, 2022 · 1 comment
Open

Comments

@dr-GitHub-account
Copy link

I am training a BERT-base model for Chinese. Default MLM and NSP tasks are used. I am trying to train the model for 96k steps to see if it benefits from longer training procedure. However, from step 65600 to step 65700, the MLM accuracy drops dramatically from 0.827 to 0.774 while the NSP accuracy remains high and stable. I am wondering how the drop takes place.

I have around 226k sentences in the original corpus and each one is split into two parts from the middle, just like book_review_bert.txt. During data preprocess, I modified the dup_factor from 5 to 50 to ensure diversity. The actual batch_size is [16 (args.batch_size) x 2 (args.world_size) x 1 (args.accumulation_steps)].

@dr-GitHub-account
Copy link
Author

dr-GitHub-account commented Dec 15, 2022

Maybe the NSP accuracy remains high and stable because it is an easy task compared to MLM.

There is still a sudden rise of NSP loss, increasing from 0.010 (and less) to 0.017 in step 65700. MLM loss also increases remarkably, from 0.7 to 1.0. I think it is safe to say that both MLM and NSP suffer from a degradation. The only reason I come up with is that an error in a small fraction of the corpus occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant