Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTS Eval: Add TTS evaluation (MOS estimation) #2392

Open
wants to merge 38 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
4e1ffd7
TTS Eval: Add TTS evaluation (MOS estimation)
flexthink Feb 6, 2024
83b7d4b
TTS Eval: Add missing docstrings to pass consistency tests
flexthink Feb 6, 2024
4e6693b
TTS Eval: Add a unit test
flexthink Feb 6, 2024
4a53ae0
TTS Eval: Add a WavLM model
flexthink Feb 13, 2024
ed3ff41
TTS Eval: Rename
flexthink Feb 13, 2024
4ee1569
TTS Eval: Fix WavLM, improve statistics
flexthink Feb 13, 2024
6d9cd09
TTS Eval: Fix WavLM
flexthink Feb 14, 2024
9fad173
TTS Eval: Fix statistics
flexthink Feb 14, 2024
565d41e
TTS Eval: Add classification pretraining with WavLM
flexthink Feb 20, 2024
e7e874f
TTS Eval: Fix contrastive sampling, add reproducibility
flexthink Feb 20, 2024
d8f779c
TTS Eval: Remove duplicated somos_prepare.py
flexthink Feb 20, 2024
a2bba8c
TTS Eval: Fixes, clean-up
flexthink Feb 20, 2024
0b1694e
TTS Evaluation: Add README
flexthink Feb 21, 2024
5e0c507
TTS Eval: Update to keep only the best model
flexthink Feb 25, 2024
ddc7886
TTS Eval: Add recipe tests
flexthink Feb 26, 2024
98563a3
TTS Eval: Miscellaneous fixes
flexthink Feb 26, 2024
e8e30f6
TTS Eval: SOMOS preparation speech
flexthink Feb 26, 2024
57d9df1
TTS Eval: Clean-up
flexthink Feb 26, 2024
184235a
TTS Eval: Update to pass consistency tests (TBD dropbox link)
flexthink Feb 26, 2024
aabe1ea
TTS Eval: Cosmetic changes (from hooks)
flexthink Feb 26, 2024
995152b
TTS Eval: Add inference
flexthink Feb 28, 2024
363f5b7
TTS Eval: Functionality improvements, add a recipe to evaluate a pret…
flexthink Mar 14, 2024
267620c
TTS Eval: Add support for frozen splits and skipping folder differences
flexthink Mar 14, 2024
9026f6f
TTS Eval: Add support for frozen splits and ignoring folders while sk…
flexthink Mar 14, 2024
12b1f60
TTS Eval: Add extra requirements
flexthink Mar 14, 2024
d486fd5
Merge branch 'develop' into ttseval
flexthink Mar 14, 2024
05adb13
TTS Eval: Add support for FastSpeech2
flexthink Mar 14, 2024
9f433bb
TTS Eval: Device fixes
flexthink Mar 15, 2024
0af174d
TTS Eval: Fixes
flexthink Mar 15, 2024
964c20a
TTS Eval: Fixes
flexthink Mar 15, 2024
12551f1
TTS Eval: Fixes
flexthink Mar 15, 2024
bf5cfcb
TTS Eval: Fixes
flexthink Mar 17, 2024
d73f2ca
TTS Eval: Cosmetic changes
flexthink Mar 18, 2024
26d4dcb
TTS Eval: Disable LM during evaluation
flexthink Mar 18, 2024
8171684
TTS Eval: Fixes for model paths
flexthink Mar 18, 2024
aadd8bb
TTS Eval: Cosmetic changes
flexthink Mar 25, 2024
c9cd33f
Merge branch 'develop' into ttseval
flexthink Mar 25, 2024
c78dd96
TTS Eval: Fix typos
flexthink Mar 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
46 changes: 46 additions & 0 deletions recipes/LJSpeech/evaluation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Text-to-Speech (with LJSpeech)
This folder contains the recipes for evaluation of existing pretrained text-to-speech systems using ASR-based evaluators and MOS estimation

By default, MOS evaluation is performed using a pretrained Transformer model, as defined in `recipes/SOMOS/ttseval/hparams/train.yaml` and available in pre-trained form on HuggingFace in
https://huggingface.co/flexthink/ttseval-wavlm-transformer

ASR evaluation is performed using the bundled Transformer ASR : https://huggingface.co/speechbrain/asr-transformer-transformerlm-librispeech

# Tacotron 2
The recipe contains hyperparameters for the evaluation of Tacotron2 in `hparams/tacotron2.yaml`

To perform evaluation, run the following script
```
python evaluate.py --data_folder=/your_folder/LJSpeech-1.1 hparams/tacotron.yaml
```


# FastSpeech2
The recipe contains hyperparameters for the evaluation of FastSpeech2 in `hparams/fastspeech2.yaml`

```
python train.py --data_folder=/your_folder/LJSpeech-1.1 hparams/fastspeech2.yaml
```


# **About SpeechBrain**
- Website: https://speechbrain.github.io/
- Code: https://github.com/speechbrain/speechbrain/
- HuggingFace: https://huggingface.co/speechbrain/


# **Citing SpeechBrain**
Please, cite SpeechBrain if you use it for your research or business.

```bibtex
@misc{speechbrain,
title={{SpeechBrain}: A General-Purpose Speech Toolkit},
author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
year={2021},
eprint={2106.04624},
archivePrefix={arXiv},
primaryClass={eess.AS},
note={arXiv:2106.04624}
}
```

81 changes: 81 additions & 0 deletions recipes/LJSpeech/evaluation/adapters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
"""Adapters for specific TTS system

Authors
* Artem Ploujnikov, 2024
"""

from torch import nn


class MelAdapter(nn.Module):
"""An adapter for TTSes that output a MEL spectrogram
and require a vocoder to synthesize an
audio wave

Arguments
---------
vocoder : torch.nn.Module | speechbrain.inference.Pretrained
the vocoder to be used
vocoder_run_opts : dict
run options for the vocoder
"""

def __init__(self, vocoder, vocoder_run_opts=None):
super().__init__()
self.vocoder_fn = vocoder
self.vocoder_run_opts = vocoder_run_opts or {}
self.vocoder = None
self.device = None

def _get_vocoder(self):
"""Instantiates the vocoder, if not already instantiated"""
if self.vocoder is None:
run_opts = dict(self.vocoder_run_opts)
if self.device is not None:
run_opts["device"] = self.device
self.vocoder = self.vocoder_fn(run_opts=run_opts)
return self.vocoder

def forward(self, tts_out):
"""Applies a vocoder to the waveform

Arguments
---------
tts_out : tuple
a (tensor, tensor) tuple with a MEL spectrogram
of shape (batch x mel x length)
and absolute lengths (as in the output of Tacotron2
or similar models)

Returns
-------
wav : torch.Tensor
The waveform
lengths : torch.Tensor
The lengths
"""
mel_outputs, mel_lengths = tts_out[:2]
vocoder = self._get_vocoder()
max_len = mel_lengths.max()
mel_outputs = mel_outputs[:, :, :max_len]
wav = vocoder(mel_outputs)
lengths = mel_lengths / max_len
return wav, lengths

def to(self, device):
"""Transfers the adapter (and the underlying model) to the
specified device

Arguments
---------
device : str | torch.Device
The device


Returns
-------
result : MelAdapter
the adapter (i.e. returns itself)
"""
self.device = device
return super().to(device)