Vits models for Persian throw error or generates unintelligible output #3667

karim23657 · 2024-04-06T08:40:39Z

karim23657
Apr 6, 2024

I use TTS v.0.21.1 .
I trained some Persian vits models : https://github.com/karim23657/Persian-tts-coqui
And here's Hugging Face demo for them:

The Hugging Face demo works very well without any errors. However, when I test them using the TTS Python API with sentences containing punctuation, it generates unintelligible output. Here is a Colab notebook with the output audio files. :

in the notebook also i see a new error AttributeError: 'TTS' object has no attribute 'is_multi_lingual'

And also I tested it on windows with bellow code:

from TTS.config import load_config
from TTS.utils.manage import ModelManager
from TTS.utils.synthesizer import Synthesizer
from playsound import playsound

config_path="C:\\Users\\computer\\Desktop\\فایل های کاری\\پروژه های شخصی\\مباحث مشکلات\\Persian_TTS_models\\951 MB\\config.json"
model_path="C:\\Users\\computer\\Desktop\\فایل های کاری\\پروژه های شخصی\\مباحث مشکلات\\Persian_TTS_models\\951 MB\\checkpoint_88000.pth"

persian_text = "سلام امیدوارم خوب باشید"
persian_text = persian_text.encode('utf-8').decode()
print(persian_text)

synthesizer = Synthesizer(model_path, config_path)
wavs = synthesizer.tts(persian_text)
output_path = 'C:\\Users\\computer\\Desktop\\sp.wav'
synthesizer.save_wav(wavs, output_path)
playsound(output_path)

 > Using model: vits
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:0
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:None
 | > fft_size:1024
 | > power:None
 | > preemphasis:0.0
 | > griffin_lim_iters:None
 | > signal_norm:None
 | > symmetric_norm:None
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:None
 | > pitch_fmax:None
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > Text splitted to sentences.
['سلام امیدوارم خوب باشید']
 > Processing time: 0.39935874938964844
 > Real-time factor: 0.7633374153989032

Output:

sp.mp4

Similar issue here : karim23657/Persian-tts-coqui#36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vits models for Persian throw error or generates unintelligible output #3667

{{title}}

Replies: 0 comments

Select a reply

Vits models for Persian throw error or generates unintelligible output #3667

karim23657 Apr 6, 2024

Replies: 0 comments

karim23657
Apr 6, 2024