Better LJSpeech or LibriTTS for finetuning a single speaker voice? Or training from scratch with not so much data? #226

Sweetapocalyps3 · 2024-04-02T10:45:52Z

Hi everyone,

I'm wondering if it should be LJSpeech or LibriTTS the proper candidate to be used to finetune a single person voice.
I've seen that there is a multispeaker boolean field in the configuration, which in my case should be set to false, but I don't know if this imply I have to use LJSpeech, since LibriTTS is a multispeaker.

Maybe is it even better to train the model from scratch? I'm thinking about it, but I suppose I have too few samples (126 files of clean audio for a total of almost 19 minutes)

Thank you in advance.

meng2468 · 2024-04-22T15:01:57Z

LibriTTS is by far the better choice, the model has seen multiple speakers, and can adapt far better to a smaller dataset for a single speaker.

You can leave all of the settings in config_ft.yml the same (Changing only dataset, then batch size and window size depending on your hardware). Multi-speaker should be kept on true, just make sure that in your dataset metafiles the speaker_id is set to the same id for each file.

Training the model from scratch from with 19 minutes of data will most likely yield bad results, although I haven't tried myself.

Helpful details on fine-tuning: #81

GUUser91 · 2024-05-13T00:20:57Z

You can use vokan.
https://huggingface.co/ShoukanLabs/Vokan

traderpedroso · 2024-05-26T11:00:50Z

You can use vokan.

https://huggingface.co/ShoukanLabs/Vokan

The expressions and emphasis in the voices sound really natural, but there are always noises at the beginning and especially at the end. I believe a pad of silence at the start and end was missing during the training.

Sweetapocalyps3 changed the title ~~Better LJSpeech or LibriTTS for finetuning a single speaker voice?~~ Better LJSpeech or LibriTTS for finetuning a single speaker voice? Or training from scratch with little data? Apr 2, 2024

Sweetapocalyps3 changed the title ~~Better LJSpeech or LibriTTS for finetuning a single speaker voice? Or training from scratch with little data?~~ Better LJSpeech or LibriTTS for finetuning a single speaker voice? Or training from scratch with not so much data? Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better LJSpeech or LibriTTS for finetuning a single speaker voice? Or training from scratch with not so much data? #226

Better LJSpeech or LibriTTS for finetuning a single speaker voice? Or training from scratch with not so much data? #226

Sweetapocalyps3 commented Apr 2, 2024 •

edited

meng2468 commented Apr 22, 2024

GUUser91 commented May 13, 2024

traderpedroso commented May 26, 2024

Better LJSpeech or LibriTTS for finetuning a single speaker voice? Or training from scratch with not so much data? #226

Better LJSpeech or LibriTTS for finetuning a single speaker voice? Or training from scratch with not so much data? #226

Comments

Sweetapocalyps3 commented Apr 2, 2024 • edited

meng2468 commented Apr 22, 2024

GUUser91 commented May 13, 2024

traderpedroso commented May 26, 2024

Sweetapocalyps3 commented Apr 2, 2024 •

edited