Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the performance of new voice(fintune) is bad #11

Open
linlinsongyun opened this issue Mar 20, 2023 · 10 comments
Open

the performance of new voice(fintune) is bad #11

linlinsongyun opened this issue Mar 20, 2023 · 10 comments

Comments

@linlinsongyun
Copy link

Thanks for your nice work.
The code works well with the pretrain stage. However, when i finetune towards an unseen voice with 10 sentences, the results is bad. The speech quality is bad, and the voice is significantly different. what went wrong?
image

@tuanh123789
Copy link
Owner

What dataset do you use in pretrain stage ?

@tuanh123789
Copy link
Owner

tuanh123789 commented Mar 20, 2023

And is the language in pretrain and finetune the same ?

@linlinsongyun
Copy link
Author

A mandarin multi-speaker dataset was used for pretraining. Another Chinese speaker was used for finetuning.

@linlinsongyun
Copy link
Author

I mentioned that only the decoder and speaker embeddings have gradients during finetune. If the decoder weights should have no grad except the condition layer norm?

@tuanh123789
Copy link
Owner

tuanh123789 commented Mar 20, 2023

Do you set num_speaker in model config equal to number of speakers in mandarin dataset in pretrain stage?

@tuanh123789
Copy link
Owner

I mentioned that only the decoder and speaker embeddings have gradients during finetune. If the decoder weights should have no grad except the condition layer norm?

Only speaker embedding and condition layernorm. I follow the paper

@linlinsongyun
Copy link
Author

Do you set num_speaker in model config equal to number of speakers in mandarin dataset in pretrain stage?

yes. i use the default config "num_speaker: 955". There are 30 speakers in the pretrain stage, whose speaker id are ranging from 1 to 31. And i use speaker_id=50 in the finetune stage.

@tuanh123789
Copy link
Owner

tuanh123789 commented Mar 20, 2023

You have to change default config "num_speaker" equal to 30 (in your case) in pretrain stage. When finetune, just set your speaker_id = 0.

@linlinsongyun
Copy link
Author

You have to change default config "num_speaker" equal to 30 (in your case) in pretrain stage. When finetune, just set your speaker_id = 0.

ok, i will have a try. Thanks a lot.

@vedantk-b
Copy link

@linlinsongyun did the finetuning improve after you changed the number of speakers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants