Skip to content

Latest commit

 

History

History

tacotron2

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Tacotron2 : Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions.

Input

A sentence for text to speech

Output

The Voice file is output as .wav which path is defined as SAVE_WAV_PATH in tacotron2.py.

Usage

Automatically downloads the onnx and prototxt files on the first run. It is necessary to be connected to the Internet while downloading.

For the sample sentence,

python3 tacotron2.py 

If you want to specify the input sentence, put the wav path after the --input option. You can use --savepath option to change the name of the output file to save.

python3 tacotron2.py --input "Hello world." --savepath SAVE_WAV_PATH

For English

There are two models that can generate speach from mel spectograms in English. The defoult is nvidia model, which uses waveglow for conversion. By choosing hifi option you can use HIFI GAN for speach generation.

python3 tacotron2.py -m hifi

For Japanese

Recognizing Japanese requires converting the text into phonemes. Conversion to phonemes requires openjtalk.

# for macOS, Linux
pip3 install pyopenjtalk
# for Windows
pip3 install pyopenjtalk-prebuilt

Run.

python3 tacotron2.py -i "こんにちは。" -m tsukuyomi

Reference

Tacotron2
ONNX Export [HIFI GAN] (https://github.com/jik876/hifi-gan/tree/master)

Framework

PyTorch

Model Format

ONNX opset = 11, 12

Netron

NVIDIA Model

Tsukuyomi Chan Model

HIFI GAN model