Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to make coqui thorsten voice "more fluent" #60

Open
alexnanchen opened this issue Mar 19, 2024 · 2 comments
Open

How to make coqui thorsten voice "more fluent" #60

alexnanchen opened this issue Mar 19, 2024 · 2 comments

Comments

@alexnanchen
Copy link

Hello,

Using coqui.ai and gruut, we have trained an example of "thorsten voice" with the provided vits recipe (~60K steps). The results is good but the rhythm of the speech is not as good as the "Thorsten voice".

See here for a comparaison: https://htmlpreview.github.io/?https://github.com/alexnanchen/tts/blob/main/examples.html

How can we improve it?

  • Do we need to train for more steps?
  • Are there some specific parameters to tune?
  • Do we need to fine tune the model on "accelerated speech"?

Many thanks!

@domcross
Copy link

Hi,
if I remember correctly we have trained our Coqui-VITS model up to nearly 1000k steps, but there weren't any improvements in quality neither audible nor technical (MOSNET, DNSMOS, SRMR) when stepping over the 600k mark.
I suggest to continue training at least up to 300k.

@alexnanchen
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants