Skip to content

Transfer learning exploration of dc_tts text-to-speech model

Notifications You must be signed in to change notification settings

SeanPLeary/dc_tts-transfer-learning

Repository files navigation

dc_tts-transfer-learning

This repo contains attempts to apply transfer learning to the dc_tts text-to-speech model decribed in the paper Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention. The code used is a modified version of Kyubyong's dc_tts code. The pretrained model was also provided in Kyubong's repo. It was pretrained on the LJ Speech Dataset. Scarlett Johansson's voice was trained during transfer learning


Transfer Learning is accomplished by selecting the model layers to train in hyperparameters.py


Task List:

  • add selectable list of layers for transfer learning
  • prelim model training
  • add scoring history plots
  • detailed exploration of which layers to train
  • explore data augmentation methods
  • explore post-processing

Prelim Model Training

  • ~6 hrs of training on Tesla V100 GPU
  • Layers trained:
    • SSRN(C_13, C_14, C_15, C_16)
    • Text2Mel/TextEnc(HC_11, HC_12, HC_13, HC_14, HC_15)
    • Text2Mel/AudioEnc(HC_9, HC_10, HC_11, HC_12, HC_13)
    • Text2Mel/AudioDec(HC_7, C_8, C_9, C_10, C_11)

Transfer learning data source:

Scarlett Johansson's audio book

Model Generated Examples (parodies of famous quotes from A.I. in movies):

references:

About

Transfer learning exploration of dc_tts text-to-speech model

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published