How to make coqui thorsten voice "more fluent" #60

alexnanchen · 2024-03-19T09:11:37Z

Hello,

Using coqui.ai and gruut, we have trained an example of "thorsten voice" with the provided vits recipe (~60K steps). The results is good but the rhythm of the speech is not as good as the "Thorsten voice".

See here for a comparaison: https://htmlpreview.github.io/?https://github.com/alexnanchen/tts/blob/main/examples.html

How can we improve it?

Do we need to train for more steps?
Are there some specific parameters to tune?
Do we need to fine tune the model on "accelerated speech"?

Many thanks!

domcross · 2024-03-19T10:14:42Z

Hi,
if I remember correctly we have trained our Coqui-VITS model up to nearly 1000k steps, but there weren't any improvements in quality neither audible nor technical (MOSNET, DNSMOS, SRMR) when stepping over the 600k mark.
I suggest to continue training at least up to 300k.

alexnanchen · 2024-03-20T14:30:02Z

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to make coqui thorsten voice "more fluent" #60

How to make coqui thorsten voice "more fluent" #60

alexnanchen commented Mar 19, 2024

domcross commented Mar 19, 2024

alexnanchen commented Mar 20, 2024

How to make coqui thorsten voice "more fluent" #60

How to make coqui thorsten voice "more fluent" #60

Comments

alexnanchen commented Mar 19, 2024

domcross commented Mar 19, 2024

alexnanchen commented Mar 20, 2024