Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with se2seq tutorial (batch training) #2840

Open
gavril0 opened this issue Apr 18, 2024 · 0 comments
Open

Issues with se2seq tutorial (batch training) #2840

gavril0 opened this issue Apr 18, 2024 · 0 comments
Labels
bug core Tutorials of any level of difficulty related to the core pytorch functionality

Comments

@gavril0
Copy link

gavril0 commented Apr 18, 2024

Add Link

Link to the tutorial:

https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

Describe the bug

The tutorial was markedly changed in June 2023, see commit 6c03bb3 which aimed at fixing the implementation of attention among other things (#2468). In doing so, several other things have been changed:

  • adding dataloader which returns a batch of zero-padded sequences to train the network
  • the foward() function of the Decoder process input one word at the time in parallel for all sentences
    in the batch until MAX_LENGTH is reached.

I am not a torch expert but I think that the embedding layers in the encoder and decoder should have been modified to recognize padding (padding_idx=0 is missing). Using zero-padded sequence as input might also have other implications during learning but I am not sure. Can you confirm that the implementation is correct?

As a result of these change, the text does not describe well the code. I think that it would be nice to include a discussion of zero-padding and the implications of using batches on the code in the tutorial. I am also curious if there is really a gain in using a batch since most sentences are short.

Finally, I found a mention in the text about using teacher_forcing_ratio which is not included in the code. The tutorial or the code need to be adjusted.

If this is useful, I found another implementation of the same tutorial which seems to be a fork from a previous version (it was archived in 2021):

  • It does not does not use batches
  • It includes teacher_forcing_ratio to select the amount of forced teaching
  • It implements both Luong et al and Bahdanau et al. models of attention

Describe your environment

I appreciate this tutorial as it provides a simple introduction to Seq2Seq models with a small dataset. I am actually trying to port this tutorial in R with torch package.

cc @albanD

@gavril0 gavril0 added the bug label Apr 18, 2024
@svekars svekars added the core Tutorials of any level of difficulty related to the core pytorch functionality label May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug core Tutorials of any level of difficulty related to the core pytorch functionality
Projects
None yet
Development

No branches or pull requests

2 participants