-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better LJSpeech or LibriTTS for finetuning a single speaker voice? Or training from scratch with not so much data? #226
Comments
LibriTTS is by far the better choice, the model has seen multiple speakers, and can adapt far better to a smaller dataset for a single speaker. You can leave all of the settings in config_ft.yml the same (Changing only dataset, then batch size and window size depending on your hardware). Multi-speaker should be kept on true, just make sure that in your dataset metafiles the speaker_id is set to the same id for each file. Training the model from scratch from with 19 minutes of data will most likely yield bad results, although I haven't tried myself. Helpful details on fine-tuning: #81 |
You can use vokan. |
The expressions and emphasis in the voices sound really natural, but there are always noises at the beginning and especially at the end. I believe a pad of silence at the start and end was missing during the training. |
Hi everyone,
I'm wondering if it should be LJSpeech or LibriTTS the proper candidate to be used to finetune a single person voice.
I've seen that there is a multispeaker boolean field in the configuration, which in my case should be set to false, but I don't know if this imply I have to use LJSpeech, since LibriTTS is a multispeaker.
Maybe is it even better to train the model from scratch? I'm thinking about it, but I suppose I have too few samples (126 files of clean audio for a total of almost 19 minutes)
Thank you in advance.
The text was updated successfully, but these errors were encountered: