Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bad results when use models downloaded from huggingface #5

Open
beanandrew opened this issue Nov 4, 2021 · 1 comment
Open

bad results when use models downloaded from huggingface #5

beanandrew opened this issue Nov 4, 2021 · 1 comment

Comments

@beanandrew
Copy link

Hi,
I try to reproduct your work with pytorch BERT model download from huggingface, only to get a very bad result, the training loss keeps around 1.0 in the first 10 epochs. But when I follow your instruction, download google BERT model and converte it with the helper script, then the training process seems to go well.
I wonder why this is happening? Is this because these two models are very different?
Huggingface download link here: https://huggingface.co/bert-base-uncased/tree/main

@frankaging
Copy link
Owner

frankaging commented Nov 4, 2021

Hi,
Thanks for your comments. I believe the reason is about the variable namings.

If you look at this line of code in the training set-up https://github.com/frankaging/Quasi-Attention-ABSA/blob/main/code/util/train_helper.py#L300,
model.bert.load_state_dict(torch.load(init_checkpoint, map_location='cpu'), strict=False)
this load_state_dict will load parameters based on names. I think, with the current code, the HuggingFace model has different names for all variables in BERT. As a result, you are not loading any weights from the pre-trained BERT. You can check this by simply printing out weights before this line and after this line. And you will see what I am talking about, I think.

There are two solutions: (1) using the google one for pre-trained weights importing like what you are doing. (2) change the code to integrate with both models. The second approach will require you to modify the variable namings of the model.

Does this make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants