Data driven AI voice cloning

This repository is an implementation of the main part of my master thesis in Data science & Engineering. It is divided in two part:

Speaker Encoder

models: ECAPA-TDNN, wavlm-series

data: VoxCeleb1, private dataset

Text-to-speech

model: FastSpeech2 (microsoft implementation)

data: LibriTTS

This two part are then integrated to achieve a Multi Speaker Text to Speech model that is capable of cloning unseen voices starting from about 5 seconds of audio, the ZeroShotFastSpeech2 model.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
SpeakerEncoder		SpeakerEncoder
ZeroShotFastSpeech2		ZeroShotFastSpeech2
.gitignore		.gitignore
README.md		README.md
voicy_app.py		voicy_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpeakerEncoder

SpeakerEncoder

ZeroShotFastSpeech2

ZeroShotFastSpeech2

.gitignore

.gitignore

README.md

README.md

voicy_app.py

voicy_app.py

Repository files navigation

Data driven AI voice cloning

About

Releases

Packages

Languages

alessandropec/data_driven_ai_voice_cloning

Folders and files

Latest commit

History

Repository files navigation

Data driven AI voice cloning

About

Topics

Resources

Stars

Watchers

Forks

Languages