General advice on training a custom dataset w/ GlowTTS #3416
Unanswered
T145
asked this question in
General Q&A
Replies: 1 comment 1 reply
-
Hello, |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
So I have a custom dataset w/ audio clips that range from a few seconds to a couple minutes. Its base sample rate is 48000, which I resample to 22050 using the
resample
script:This is the only data preparation I perform as I assume audio with a lower quality is easier for the trainer to work with. To train a model I'm using the script given in the official tutorial, except I boosted the epoch count from 100 to 300.
Performing text-to-speech using the best model and a source WAV is fairly robotic, so what should be done to make it closer to the original quality? Are there any recommended parameters to add or remove in the config to help, or more that should be done to the audio itself?
EDIT: The model is
en
and not multi-lingual.Beta Was this translation helpful? Give feedback.
All reactions