SpeechT5 integration #2441

helleuch · 2024-02-29T19:54:35Z

What does this PR do?

The goal of this PR is to integrate the SpeechT5 model for speech to text into SpeechBrain.
It also comes with a recipe for Tamasheq to French automatic speech translation under the IWSLT22 directory.

Before submitting

Did you read the contributor guideline?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified
Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
Review the self-review checklist to ensure the code is ready for review

…to speechT5

…ained moedel

…to speechT5

helleuch · 2024-03-06T17:27:33Z

Hello,
I have a question regarding the submission of the results obtained by running the submitted recipe. Do I have to upload the results to DropBox myself ? Or are recipes ran by the reviewers and then have the obtained results uploaded ?

Adel-Moumen · 2024-03-08T09:20:55Z

Hello, I have a question regarding the submission of the results obtained by running the submitted recipe. Do I have to upload the results to DropBox myself ? Or are recipes ran by the reviewers and then have the obtained results uploaded ?

Hello @helleuch, thanks for your PR! You'll need to upload them on a cloud storage so that we can download the ckpts and upload them on our official dropbox.

helleuch · 2024-03-08T10:38:30Z

Thank you @Adel-Moumen. I will upload them soon and send you the link :)
I will update the results in the readme file.

recipes/IWSLT22_lowresource/AST/transformer/SpeechT5_README.md

Adel-Moumen

Hello,

Thanks a lot for this PR. This is generally speaking a great and very nice addition. I left some comments that I think you be addressed. I also put in comment the link to the dropbox. Could you please update the README so that it reflect to test results obtained?

Could you please confirm me if you ran the recipe test on this PR?

Thanks a lot :)

p.s. make sure to fix the failing pre-commit... :)

recipes/IWSLT22_lowresource/AST/transformer/hparams/train_speecht5_st.yaml

recipes/IWSLT22_lowresource/AST/transformer/train_speecht5.py

speechbrain/lobes/models/huggingface_transformers/speecht5.py

Adel-Moumen · 2024-03-20T14:54:17Z

speechbrain/lobes/models/huggingface_transformers/speecht5.py

+ def forward_decoder(self, audio_features, decoder_input_ids):
+ """Perform one step of the SpeechT5 decoder.
+
+ Arguments
+ ---------
+ audio_features : torch.Tensor
+ A batch of audio features (SpeechT5 encoding).
+ decoder_input_ids : torch.Tensor
+ A batch of decoder inputs tokens.
+
+ For more details or go to theseq2seq2.py file in SpeechBrain to see how to generate
+ the tokens with Greedy Search and/or Beam Search.
+ """


im wondering but is there kv cache support on this model with HF ?

I do not know, sorry

speechbrain/lobes/models/huggingface_transformers/speecht5.py

…n module.

helleuch added 4 commits February 29, 2024 20:47

SpeechT5 for speech to text integration

3aedd5c

SpeechT5 recipe for AST

3471d81

SpeechT5 recipe for AST

3547d02

Merge branch 'speechT5' of https://github.com/helleuch/speechbrain in…

c0e71f4

…to speechT5

helleuch changed the title ~~Speech t5~~ SpeechT5 integration Mar 1, 2024

helleuch force-pushed the speechT5 branch from 2500d6b to 0235ca8 Compare March 4, 2024 13:59

git push -fCustom checkpoint loading hook

1fbd69d

helleuch force-pushed the speechT5 branch from 0235ca8 to 1fbd69d Compare March 4, 2024 14:00

asumagic mentioned this pull request Mar 5, 2024

Adding spoken dialogue state tracking recipes for Spoken MultiWoz and SpokenWoz datasets #2424

Open

13 tasks

helleuch and others added 7 commits March 5, 2024 15:16

New load + save hook to avoid model mismatch errors when loading a tr…

56e67f0

…ained moedel

Added docstrings for the S2SSpeechT5BeamSearch class

a555233

Fix hparam file for speecht5 recipe

889b055

Added documentation for the SpeechT5 recipe

ff510e5

Added the new SpeechT5 recipe to the testing list

c3fc242

Merge branch 'speechbrain:develop' into speechT5

0f7d820

Merge branch 'speechT5' of https://github.com/helleuch/speechbrain in…

b494373

…to speechT5

Ignoring mismatched sizes in the transformers from_pretrained() method

1d59ee3

Adel-Moumen reviewed Mar 20, 2024

View reviewed changes

recipes/IWSLT22_lowresource/AST/transformer/SpeechT5_README.md Outdated Show resolved Hide resolved

Adel-Moumen requested changes Mar 20, 2024

View reviewed changes

helleuch and others added 8 commits April 1, 2024 23:58

ST5 recipe: added a header to the yaml file

7eab08a

ST5 recipe: Improved training script

5f7ee24

ST5 integration : Addressing beam search comments

3dc662c

ST5 recipe: Integrated skip_prep param into the data preparation script

3c5c781

ST5 integration : Completing docstrings in the SpeechT5 implementatio…

77a77a8

…n module.

ST5 recipe: Added validation and test results in the readme file.

efa178e

ST5 recipe: Updated the CSV file for recipe tests

6cf918a

Merge branch 'speechbrain:develop' into speechT5

d7691b8

ST5 recipe: fixing YAML file

fa4d88a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpeechT5 integration #2441

SpeechT5 integration #2441

helleuch commented Feb 29, 2024 •

edited

helleuch commented Mar 6, 2024 •

edited

Adel-Moumen commented Mar 8, 2024

helleuch commented Mar 8, 2024

Adel-Moumen left a comment •

edited

Adel-Moumen Mar 20, 2024

helleuch Apr 2, 2024

SpeechT5 integration #2441

Are you sure you want to change the base?

SpeechT5 integration #2441

Conversation

helleuch commented Feb 29, 2024 • edited

What does this PR do?

PR review

helleuch commented Mar 6, 2024 • edited

Adel-Moumen commented Mar 8, 2024

helleuch commented Mar 8, 2024

Adel-Moumen left a comment • edited

Choose a reason for hiding this comment

Adel-Moumen Mar 20, 2024

Choose a reason for hiding this comment

helleuch Apr 2, 2024

Choose a reason for hiding this comment

helleuch commented Feb 29, 2024 •

edited

helleuch commented Mar 6, 2024 •

edited

Adel-Moumen left a comment •

edited