You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there, thanks for this repo and the pretrained models.
I have a question on batching sequences of varying length. I've found the padding token and tokenizer to work effectively, but I see no input of an attention mask to the forward pass of the model.
I've tried passing a padded sequence, e.g. padded with 4s as output by the tokenizer, and a non-padded sequence. The resulting embeddings of at least the last few tokens are very different between these two examples.
The common pattern is to also provide an attention mask. I try to pass this like model(input_ids, attn_mask=attn_mask) but it this isn't how it's set up. I looked through the source code and can't find the way that an attention mask mechanism would work in it.
Is there a way to batch sequences of varying length and how should I do this?
The text was updated successfully, but these errors were encountered:
Hi there, thanks for this repo and the pretrained models.
I have a question on batching sequences of varying length. I've found the padding token and tokenizer to work effectively, but I see no input of an attention mask to the forward pass of the model.
I've tried passing a padded sequence, e.g. padded with 4s as output by the tokenizer, and a non-padded sequence. The resulting embeddings of at least the last few tokens are very different between these two examples.
The common pattern is to also provide an attention mask. I try to pass this like model(input_ids, attn_mask=attn_mask) but it this isn't how it's set up. I looked through the source code and can't find the way that an attention mask mechanism would work in it.
Is there a way to batch sequences of varying length and how should I do this?
The text was updated successfully, but these errors were encountered: