New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use models as Seq2Seq model #30529
Comments
@mnaylor5 do you have any advice on how to use MEGA as seq2seq? Or should the original fairseq implementation be used for this case? |
@Bachstelze - I'm not working on this at the moment (I just contributed MEGA to The MEGA implementation is set up very similarly to BERT/RoBERTa, so any resources that demonstrate how to use those models in a seq2seq setting should also apply to MEGA. I haven't personally done much in the seq2seq space, so I'll defer to the fine folks at Hugging Face to provide guidance on that 😄 |
System Info
transformers
version: 4.40.0Who can help?
@ArthurZucker @muellerzr @stevhliu
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
There is this snippet in many model documentations:
To be used in a Seq2Seq model, the model needs to initialized with both is_decoder=True and bidirectional=False argument as well as add_cross_attention set to True; an encoder_hidden_states is then expected as an input to the forward pass.
I try it like this for the MEGA model:
The following error occurs, when training it with
Seq2SeqTrainer
,Seq2SeqTrainingArguments
andDataCollatorForSeq2Seq
:Expected behavior
Train MEGA like a seq2seq as in their paper.
The text was updated successfully, but these errors were encountered: