megatts2

Unofficial implementation of Megatts2

TODO

Base test

Prepare dataset
VQ-GAN
ADM
PLM

Better version

Replace Hifigan with Bigvgan
Mix training Chinese and English
Train on about 1k hours of speech
Webui

Install mfa

conda create -n aligner && conda activate aligner
conda install -c conda-forge montreal-forced-aligner=2.2.17

Prepare dataset

Prepare wav and txt files to ./data/wav
Run python3 prepare_ds.py --stage 0 --num_workers 4 --wavtxt_path data/wavs --text_grid_path data/textgrids --ds_path data/ds
mfa model download acoustic mandarin_mfa
mfa align data/wavs utils/mandarin_pinyin_to_mfa_lty.dict mandarin_mfa data/textgrids --clean -j 12 -t /workspace/tmp
Run python3 prepare_ds.py --stage 1 --num_workers 4 --wavtxt_path data/wavs --text_grid_path data/textgrids --ds_path data/ds
Run python3 prepare_ds.py --stage 2 --generator_config configs/config_gan.yaml --generator_ckpt generator.ckpt after training generator.

Train

Training procedure refers to Pytorch-lightning

Infer test

python infer.py

Citing

@misc{2307.07218,
Author = {Ziyue Jiang and Jinglin Liu and Yi Ren and Jinzheng He and Chen Zhang and Zhenhui Ye and Pengfei Wei and Chunfeng Wang and Xiang Yin and Zejun Ma and Zhou Zhao},
Title = {Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts},
Year = {2023},
Eprint = {arXiv:2307.07218},
}

License

MIT
Support by Simon of ZideAI

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
config		config
configs		configs
examples		examples
models		models
modules		modules
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cli.py		cli.py
infer.py		infer.py
prepare_ds.py		prepare_ds.py
requirements.txt		requirements.txt

License

LSimon95/megatts2

Folders and files

Latest commit

History

Repository files navigation

megatts2

TODO

Base test

Better version

Install mfa

Prepare dataset

Train

Infer test

Citing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages