Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best alignment mapping for English #14

Open
meryemmhamdi1 opened this issue Mar 13, 2020 · 3 comments
Open

Best alignment mapping for English #14

meryemmhamdi1 opened this issue Mar 13, 2020 · 3 comments

Comments

@meryemmhamdi1
Copy link

Hi,

Thank you for this interesting work. I am currently working on extending this approach on top of mBERT and would need to generate English mapping from scratch. Did you learn the W matrix for English by aligning English to itself using MUSE? Wouldn't that be redundant?

Thanks,

@TalSchuster
Copy link
Owner

Hi,

For English we used the identity matrix (divided by the empirical norm). You can checkgen_anchors_bert.py for computing it for BERT. Actually, I ran it for multilingual BERT for a few languages and can share the files with you if you're interested.

@meryemmhamdi1
Copy link
Author

meryemmhamdi1 commented Mar 14, 2020

Hi,

Thank you for your prompt response! I would be interested in comparing the procedure used with mine also for new languages such as Arabic. Did you use the same evaluation set of Wikipedia dumps to generate anchors? I am also working on extending biaffine parser with alignment matrices on top of bert token embedders. I would probably push a pull request for that on AllenNLP. If you can share a link for alignment matrices for the languages you have now, that would be appreciated.

Thanks,

@TalSchuster
Copy link
Owner

Sounds great.
This is a mapping Spanish to English for the last layer.
Actually, for multilingual BERT the norm doesn't seem to vary (probably because it was jointly trained) so you can just use the identity matrix for English.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants