New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SpeechT5 integration #2441
base: develop
Are you sure you want to change the base?
SpeechT5 integration #2441
Conversation
Hello, |
Hello @helleuch, thanks for your PR! You'll need to upload them on a cloud storage so that we can download the ckpts and upload them on our official dropbox. |
Thank you @Adel-Moumen. I will upload them soon and send you the link :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello,
Thanks a lot for this PR. This is generally speaking a great and very nice addition. I left some comments that I think you be addressed. I also put in comment the link to the dropbox. Could you please update the README so that it reflect to test results obtained?
Could you please confirm me if you ran the recipe test on this PR?
Thanks a lot :)
p.s. make sure to fix the failing pre-commit... :)
recipes/IWSLT22_lowresource/AST/transformer/hparams/train_speecht5_st.yaml
Outdated
Show resolved
Hide resolved
def forward_decoder(self, audio_features, decoder_input_ids): | ||
"""Perform one step of the SpeechT5 decoder. | ||
|
||
Arguments | ||
--------- | ||
audio_features : torch.Tensor | ||
A batch of audio features (SpeechT5 encoding). | ||
decoder_input_ids : torch.Tensor | ||
A batch of decoder inputs tokens. | ||
|
||
For more details or go to theseq2seq2.py file in SpeechBrain to see how to generate | ||
the tokens with Greedy Search and/or Beam Search. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im wondering but is there kv cache support on this model with HF ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know, sorry
What does this PR do?
The goal of this PR is to integrate the SpeechT5 model for speech to text into SpeechBrain.
It also comes with a recipe for Tamasheq to French automatic speech translation under the IWSLT22 directory.
Before submitting
PR review
Reviewer checklist