Fine-tuning or training from scratch in a differente language? #197

paulovasconcellos-hotmart · 2024-01-30T18:56:17Z

Hi everyone,
I'm considering putting some effort into training StyleTTS in Portuguese. I have a good-quality dataset for this task, however, I was in doubt if it would be better just to fine-tune the model (which I know was trained in English), or (since it's an unseen language) train the model from scratch in Portuguese.

Does anyone have some tips on what I should consider before making a decision?

martinambrus · 2024-02-15T07:49:02Z

Definitelly train a new PLBERT for a new language. You can try with the one trained in English but even the author says it probably won't work.

rlenain · 2024-02-28T09:47:55Z

Hi there -- I have trained a PL-BERT model on a 14 language dataset which was crowdsourced by the author of the paper. You can find this model open-sourced here: https://huggingface.co/papercup-ai/multilingual-pl-bert

Using this PL-BERT model, you can now train multilingual StyleTTS2 models. In my experiments, I have found that you don't need to train from scratch in order to train multilingual StyleTTS2, you can just finetune. Follow the steps outlined in the link I shared above!

Best of luck, and let me know what you make with this!

paulovasconcellos-hotmart · 2024-02-28T20:41:19Z

Thank you very much for this @rlenain . I'll use this model to train StyleTTS on my data

Stardust-minus · 2024-02-29T01:20:43Z

Hi there -- I have trained a PL-BERT model on a 14 language dataset which was crowdsourced by the author of the paper. You can find this model open-sourced here: https://huggingface.co/papercup-ai/multilingual-pl-bert

Using this PL-BERT model, you can now train multilingual StyleTTS2 models. In my experiments, I have found that you don't need to train from scratch in order to train multilingual StyleTTS2, you can just finetune. Follow the steps outlined in the link I shared above!

Best of luck, and let me know what you make with this!

Nice work！Did the Chinese data the model used for training include tone？

rlenain · 2024-02-29T10:31:18Z

I'm not sure -- you can see a sample here (the data is from this dataset: https://huggingface.co/datasets/styletts2-community/multilingual-phonemes-10k-alpha/viewer/zh).

Frederieke93 · 2024-03-05T16:31:30Z

Thank you very much @rlenain! This is a great addition! You mentioned you can just finetune on a new language instead of training a new base model, I'd like to try it. How large are the datasets you used for the finetuning on a new language?

rlenain · 2024-03-05T17:25:57Z

i tend to keep some english in the dataset (~5 hours) and have had success with as little as 20 hours of Spanish data split across 4 speakers

casic · 2024-03-06T11:00:15Z

Where to see this 14 langs ? На вт, 5.03.2024 г. в 19:26 Raphael Lenain ***@***.***> написа:

…

i tend to keep some english in the dataset (~5 hours) and have had success with as little as 20 hours of Spanish data split across 4 speakers — Reply to this email directly, view it on GitHub <#197 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAYTSUJO4TM5JKIW6AJPCETYWX53JAVCNFSM6AAAAABCRYTYCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZZGI3TONJRGQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

rlenain · 2024-03-06T11:06:03Z

https://huggingface.co/papercup-ai/multilingual-pl-bert

casic · 2024-03-06T11:10:10Z

Thanks На ср, 6.03.2024 г. в 13:06 Raphael Lenain ***@***.***> написа:

…

https://huggingface.co/papercup-ai/multilingual-pl-bert — Reply to this email directly, view it on GitHub <#197 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAYTSULL7ABSYHBRN7DKVWTYW32CNAVCNFSM6AAAAABCRYTYCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBQGYZDAMRWHE> . You are receiving this because you commented.Message ID: ***@***.***>

ZYJGO · 2024-03-19T18:06:45Z

@rlenain > i tend to keep some english in the dataset (~5 hours) and have had success with as little as 20 hours of Spanish data split across 4 speakers

thanks for the great work! do you have some samples to share? I'm very curious about the quality on a new language

rlenain · 2024-03-21T10:28:41Z

Unfortunately because of the privacy policy of the samples that I trained on, I cannot share these samples here. What I can say is that the quality is very much on-par with samples you can find on the samples page in English.

traderpedroso · 2024-04-03T14:06:29Z

Unfortunately because of the privacy policy of the samples that I trained on, I cannot share these samples here. What I can say is that the quality is very much on-par with samples you can find on the samples page in English.

I would like to ask three questions: Do the speakers in the dataset need to be in a numeric format, for example, speaker 0, 1, 2, and do they have to be different from 0, or can I put all of them with the same name or even in a string format like a name to facilitate the recognition of the speakers? The other question is, after training the speakers, to access them, do I need to define the speakers in the inference and what about the language selector is automatic?

sch0ngut · 2024-04-29T16:35:31Z

@rlenain

i tend to keep some english in the dataset (~5 hours) and have had success with as little as 20 hours of Spanish data split across 4 speakers

@rlenain Do you mind sharing for how many epochs you fine-tuned?

rlenain · 2024-04-30T09:33:43Z

@sch0ngut Generally for 50k-100k iterations, whatever that means in terms of epochs for the size of your dataset. But you should be following the validation curve.

21sK1p · 2024-05-01T12:21:51Z

Hi there -- I have trained a PL-BERT model on a 14 language dataset which was crowdsourced by the author of the paper. You can find this model open-sourced here: https://huggingface.co/papercup-ai/multilingual-pl-bert

Using this PL-BERT model, you can now train multilingual StyleTTS2 models. In my experiments, I have found that you don't need to train from scratch in order to train multilingual StyleTTS2, you can just finetune. Follow the steps outlined in the link I shared above!

Best of luck, and let me know what you make with this!

@rlenain what would I need to do if have to train it in hindi language?

rlenain · 2024-05-01T12:28:32Z

You can probably just finetune StyleTTS2 without changing the PL-BERT model, and it would work, with the right data and amount of data.
If you want to train PL-BERT on Hindi, I believe there's data here: https://huggingface.co/datasets/styletts2-community/multilingual-pl-bert

JingchengYang4 · 2024-05-02T09:02:41Z

@rlenain Regarding this multilingual pl-bert, it appears the data used to train this model uses a data-processing script that's unavailable to the general public - how would we be able to tokenize the training data for StyleTTS in the same form as the Bert model?

rlenain · 2024-05-02T09:27:30Z

the data here (https://huggingface.co/datasets/styletts2-community/multilingual-pl-bert) has been tokenized using the tokenizer of the bert-multilingual-base-cased model: https://huggingface.co/google-bert/bert-base-multilingual-cased

chocolatedesue · 2024-05-08T01:46:29Z

Hello @rlenain,

I've successfully trained StyleTTS2 with the multilingual PL-BERT from this source during the first stage using the LJSpeech dataset provided in this repository.

However, I encountered an issue at the start of the second stage where NaN values appeared. Could you help me identify any potential mistakes?

Here's what I've done so far:

Converted the source WAV files to a 24k WAV format.
Replaced the files in Utils/PLBERT/ with the multilingual PL-BERT.
Conducted training on eight 3090 cards for 12 hours without any other modifications.

first stage loss graph

Appended

with debug , i find the first nan comes from

StyleTTS2/train_second.py

Line 400 in 5cedc71

F0_fake, N_fake = model.predictor.F0Ntrain(p_en, s_dur)

chocolatedesue · 2024-05-08T02:57:14Z

Hello @rlenain,

I've successfully trained StyleTTS2 with the multilingual PL-BERT from this source during the first stage using the LJSpeech dataset provided in this repository.

However, I encountered an issue at the start of the second stage where NaN values appeared. Could you help me identify any potential mistakes?

Here's what I've done so far:

Converted the source WAV files to a 24k WAV format.

Replaced the files in Utils/PLBERT/ with the multilingual PL-BERT.

Conducted training on eight 3090 cards for 12 hours without any other modifications.

first stage loss graph

Appended

with debug , i find the first nan comes from

StyleTTS2/train_second.py

Line 400 in 5cedc71

F0_fake, N_fake = model.predictor.F0Ntrain(p_en, s_dur)

solve it, just a bad config that casuing the first stage params loads to second stage model params

I should config first_stage_path instead of pretrained_model

yl4579 added the help wanted Extra attention is needed label Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tuning or training from scratch in a differente language? #197

Fine-tuning or training from scratch in a differente language? #197

paulovasconcellos-hotmart commented Jan 30, 2024

martinambrus commented Feb 15, 2024

rlenain commented Feb 28, 2024

paulovasconcellos-hotmart commented Feb 28, 2024 •

edited

Stardust-minus commented Feb 29, 2024

rlenain commented Feb 29, 2024

Frederieke93 commented Mar 5, 2024

rlenain commented Mar 5, 2024

casic commented Mar 6, 2024 via email

rlenain commented Mar 6, 2024

casic commented Mar 6, 2024 via email

ZYJGO commented Mar 19, 2024 •

edited

rlenain commented Mar 21, 2024 •

edited

traderpedroso commented Apr 3, 2024 •

edited

sch0ngut commented Apr 29, 2024

rlenain commented Apr 30, 2024

21sK1p commented May 1, 2024 •

edited

rlenain commented May 1, 2024

JingchengYang4 commented May 2, 2024

rlenain commented May 2, 2024

chocolatedesue commented May 8, 2024 •

edited

chocolatedesue commented May 8, 2024 •

edited

Fine-tuning or training from scratch in a differente language? #197

Fine-tuning or training from scratch in a differente language? #197

Comments

paulovasconcellos-hotmart commented Jan 30, 2024

martinambrus commented Feb 15, 2024

rlenain commented Feb 28, 2024

paulovasconcellos-hotmart commented Feb 28, 2024 • edited

Stardust-minus commented Feb 29, 2024

rlenain commented Feb 29, 2024

Frederieke93 commented Mar 5, 2024

rlenain commented Mar 5, 2024

casic commented Mar 6, 2024 via email

rlenain commented Mar 6, 2024

casic commented Mar 6, 2024 via email

ZYJGO commented Mar 19, 2024 • edited

rlenain commented Mar 21, 2024 • edited

traderpedroso commented Apr 3, 2024 • edited

sch0ngut commented Apr 29, 2024

rlenain commented Apr 30, 2024

21sK1p commented May 1, 2024 • edited

rlenain commented May 1, 2024

JingchengYang4 commented May 2, 2024

rlenain commented May 2, 2024

chocolatedesue commented May 8, 2024 • edited

chocolatedesue commented May 8, 2024 • edited

paulovasconcellos-hotmart commented Feb 28, 2024 •

edited

ZYJGO commented Mar 19, 2024 •

edited

rlenain commented Mar 21, 2024 •

edited

traderpedroso commented Apr 3, 2024 •

edited

21sK1p commented May 1, 2024 •

edited

chocolatedesue commented May 8, 2024 •

edited

chocolatedesue commented May 8, 2024 •

edited