-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine-tuning or training from scratch in a differente language? #197
Comments
Definitelly train a new PLBERT for a new language. You can try with the one trained in English but even the author says it probably won't work. |
Hi there -- I have trained a PL-BERT model on a 14 language dataset which was crowdsourced by the author of the paper. You can find this model open-sourced here: https://huggingface.co/papercup-ai/multilingual-pl-bert Using this PL-BERT model, you can now train multilingual StyleTTS2 models. In my experiments, I have found that you don't need to train from scratch in order to train multilingual StyleTTS2, you can just finetune. Follow the steps outlined in the link I shared above! Best of luck, and let me know what you make with this! |
Thank you very much for this @rlenain . I'll use this model to train StyleTTS on my data |
Nice work!Did the Chinese data the model used for training include tone? |
I'm not sure -- you can see a sample here (the data is from this dataset: https://huggingface.co/datasets/styletts2-community/multilingual-phonemes-10k-alpha/viewer/zh). |
Thank you very much @rlenain! This is a great addition! You mentioned you can just finetune on a new language instead of training a new base model, I'd like to try it. How large are the datasets you used for the finetuning on a new language? |
i tend to keep some english in the dataset (~5 hours) and have had success with as little as 20 hours of Spanish data split across 4 speakers |
Where to see this 14 langs ?
На вт, 5.03.2024 г. в 19:26 Raphael Lenain ***@***.***>
написа:
… i tend to keep some english in the dataset (~5 hours) and have had success
with as little as 20 hours of Spanish data split across 4 speakers
—
Reply to this email directly, view it on GitHub
<#197 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAYTSUJO4TM5JKIW6AJPCETYWX53JAVCNFSM6AAAAABCRYTYCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZZGI3TONJRGQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thanks
На ср, 6.03.2024 г. в 13:06 Raphael Lenain ***@***.***>
написа:
… https://huggingface.co/papercup-ai/multilingual-pl-bert
—
Reply to this email directly, view it on GitHub
<#197 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAYTSULL7ABSYHBRN7DKVWTYW32CNAVCNFSM6AAAAABCRYTYCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBQGYZDAMRWHE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
@rlenain > i tend to keep some english in the dataset (~5 hours) and have had success with as little as 20 hours of Spanish data split across 4 speakers thanks for the great work! do you have some samples to share? I'm very curious about the quality on a new language |
Unfortunately because of the privacy policy of the samples that I trained on, I cannot share these samples here. What I can say is that the quality is very much on-par with samples you can find on the samples page in English. |
I would like to ask three questions: Do the speakers in the dataset need to be in a numeric format, for example, speaker 0, 1, 2, and do they have to be different from 0, or can I put all of them with the same name or even in a string format like a name to facilitate the recognition of the speakers? The other question is, after training the speakers, to access them, do I need to define the speakers in the inference and what about the language selector is automatic? |
@sch0ngut Generally for 50k-100k iterations, whatever that means in terms of epochs for the size of your dataset. But you should be following the validation curve. |
@rlenain what would I need to do if have to train it in hindi language? |
You can probably just finetune StyleTTS2 without changing the PL-BERT model, and it would work, with the right data and amount of data. |
@rlenain Regarding this multilingual pl-bert, it appears the data used to train this model uses a data-processing script that's unavailable to the general public - how would we be able to tokenize the training data for StyleTTS in the same form as the Bert model? |
the data here (https://huggingface.co/datasets/styletts2-community/multilingual-pl-bert) has been tokenized using the tokenizer of the |
Hello @rlenain, I've successfully trained StyleTTS2 with the multilingual PL-BERT from this source during the first stage using the LJSpeech dataset provided in this repository. However, I encountered an issue at the start of the second stage where NaN values appeared. Could you help me identify any potential mistakes? Here's what I've done so far:
Appended
|
solve it, just a bad config that casuing the first stage params loads to second stage model params I should config first_stage_path instead of pretrained_model |
Hi everyone,
I'm considering putting some effort into training StyleTTS in Portuguese. I have a good-quality dataset for this task, however, I was in doubt if it would be better just to fine-tune the model (which I know was trained in English), or (since it's an unseen language) train the model from scratch in Portuguese.
Does anyone have some tips on what I should consider before making a decision?
The text was updated successfully, but these errors were encountered: