-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
微调数据建议的音频长度 #1070
Comments
我是尽量长短都提供一定数量的样本。 |
在3s-10s之间选择,尽量每段音频的语速和情绪是一致的,例如一段悲伤的语音就不要包含转换成欢快情绪的片段了,最好不要有一句话说了一半就被截断的情况。超出10s没有意义,很多样本会直接被弃,根本不会生成频谱模型文件,也就不会被纳入到训练过程中。 |
如果是这样的话,训练出来的模型语速跟情感是不是都是单一的,如果想要多情感?只能分开训练不同模型? |
你说的这个是参考,训练最长可以用接近一分钟的 |
不知道有没有人测试过,比如一句30s的音频,是做成1句30s,还是3句10s效果更好?也就是语音切分有没有什么讲究?
The text was updated successfully, but these errors were encountered: