Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer learning and fine-tuning tts #191

Open
ToiYeuTien opened this issue Oct 18, 2023 · 3 comments
Open

Transfer learning and fine-tuning tts #191

ToiYeuTien opened this issue Oct 18, 2023 · 3 comments

Comments

@ToiYeuTien
Copy link

Hi everybody !
I have trained on my computer a Vietnamese female voice model in 500k steps. and I found the voice quite clear. I want to train another male Vietnamese voice.
I learned there is a training method based on a previously trained model, which will shorten the training time.
Can someone help me with that method.
Thank you !

@CavidanZ
Copy link

Hello. Not sure if a reply this late will help, but that is simply known as transfer learning. You take your first model's checkpoint, pass it as the pre_trained model, and warm start from that point. This will ensure that your model now has the new speaker's voice as well as benefitting from the previous training.

@ToiYeuTien
Copy link
Author

Hello. Not sure if a reply this late will help, but that is simply known as transfer learning. You take your first model's checkpoint, pass it as the pre_trained model, and warm start from that point. This will ensure that your model now has the new speaker's voice as well as benefitting from the previous training.

Hello, thank you for your response.
I understand that to fine-tune the model in such a way, I just need to replace the audio files and metadata of the new model in the location of the old model, and continue training, right? I would appreciate your feedback!

@CavidanZ
Copy link

CavidanZ commented Mar 16, 2024

Yes. I have done it with tacotron 2 model, and it for sure works. Basically you would do the training just as in the first time: you get your audio dataset ready, and give the model your new audios and the audios' transcriptions.

  1. Just the only difference is you choose the pre trained model to be your previously trained model's checkpoint.
  2. You make use of warm starting. It should be like a parameter in hparams that you set to TRUE.
  3. One more thing: do not change the original batch size. At least in tacotron 2 whenever I changed batch size it printed out errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants