-
-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss Curve #10
Comments
Sorry, missed the results folder |
Hi, yes the training loss curve for semeval training is in the results folder. Please note that the MTB model has been updated and hence the old loss curve for MTB pretraining should be ignored |
Thanks @plkmo ! |
Yup, will do so once I have the available GPU compute to satisfactorily pre-train it on suitable data. |
Looks good, I also got something like this with cnn dataset. But note that the loss consists of lm_loss + MTB_loss. From what I can see, lm_loss seems to decrease much more than MTB loss. If you can, try a larger dataset for MTB pre-training, as cnn dataset might be too small. Eg. the paper used wiki dumps data which is huge. |
@plkmo From what I could see, you weren't able to get good results from MTB using cnn either, right? I did a pretraining and applied it on the task afterwards, and the results were quite worse than using bert alone. |
Yeah, no good results pretraining MTB based on CNN dataset so far. Best is to directly fine-tune using pre-trained BERT. |
sorry to bother you, but when I run the program first day, it did work. it said IndexError: ('list index out of range', 'occurred at index 47') prog-bar: 100%|██████████| 8000/8000 [00:01<00:00, 4026.81it/s] |
My result of MTB pretraining based on CNN dataset is bad too, and pretraining task takes a long time. I wonder how much time do you take to pretraining MTB with a good result? |
Hi @plkmo !
Great work!
Do you have the loss curve available from your trainings (pre-training and semeval training) so we can check if our experiments are matching yours?
Thanks!
The text was updated successfully, but these errors were encountered: