Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐸 TTS roadmap #378

Closed
39 of 58 tasks
erogol opened this issue Mar 13, 2021 · 60 comments
Closed
39 of 58 tasks

🐸 TTS roadmap #378

erogol opened this issue Mar 13, 2021 · 60 comments
Labels
TODOs wontfix This will not be worked on but feel free to help.

Comments

@erogol
Copy link
Member

erogol commented Mar 13, 2021

These are the main dev plans for 🐸 TTS.

If you want to contribute to 🐸 TTS and don't know where to start you can pick one here and start with our Contribution Guideline. We're also always here to help.

Feel free to pick one or suggest a new one.

Contributions are always welcome 💪 .

v0.1.0 Milestones

  • Better model config handling [Discussion] Ideas for better model config management #21
  • TTS recipes for public datasets.
  • TTS trainer API to unify all the model training scripts.
  • TTS, Vocoder and SpeakerEncoder model abstractions and APIs.
  • Documentation for
    • Implementing a new model using 🐸 TTS.
    • Training a model on a new dataset from gecko.
    • Using Synthesizer interface on CLI or Server.
    • Extracting Spectrograms for Vocoder training.
    • Contributing a new pre-trained 🐸 TTS model.
    • Explanation for Model config parameters/

v0.2.0 Milestones

  • Grapheme 2 Phoneme in-house conversion. (Thx to gruut 👍 )
  • Implement VITS model.

v0.3.0 Milestones

  • Implement generic ForwardTTS API.
  • Implement Fast Speech model.
  • Implement Fast Pitch model.

v0.4.0 Milestones

v0.5.0 Milestones

  • Support for multi-lingual models
  • YourTTS release 🚀

v0.6.0 Milestones

v0.7.0 Milestones

v0.8.0 Milestones

  • Separate numpy transforms
  • Better data sampling for VITS
  • New Thorsten DE models 👑 @thorstenMueller

🏃‍♀️ Milestones along the way

🤖 New TTS models

@erogol erogol added the TODOs label Mar 13, 2021
@erogol erogol changed the title Main Development plans for 🐸 TTS. Main Development plans for 🐸 TTS. Mar 13, 2021
@erogol erogol pinned this issue Mar 13, 2021
@lucascassiano
Copy link

great project! Excited to see this growing!

@AndrewBarfield
Copy link

I'm learning the code/API and performing experiments. I hope to contribute soon.

I'm also wondering if I can donate (money) to Coqui?

@kdavis-coqui
Copy link
Member

I'm learning the code/API and performing experiments. I hope to contribute soon.

I'm also wondering if I can donate (money) to Coqui?

Wow! Thanks! Humbling.

We were setting up GitHub sponsors, but the tax implications were onerous.

We're currently exploring Patreon. So stay tuned!

@agrinh
Copy link
Contributor

agrinh commented Apr 26, 2021

@erogol Thanks for sharing the plans!

Do you have any thoughts (or need help to) simplifying the dependencies a bit? I'm thinking that if TTS is used as a lib installed over pip it might be nice to remove visualisation dependencies only used in notebooks, removing test/dev dependencies and moving e.g. tensorflow into extras to reduce the footprint. Personally would love to use this as a dependency rather than maintaining my own fork.

@erogol
Copy link
Member Author

erogol commented Apr 26, 2021

@agrinh Why do you need to keep your own fork exactly? It'd be better to expand the conversation on gitter if you like.

@agrinh
Copy link
Contributor

agrinh commented Apr 26, 2021

@agrinh Why do you need to keep your own fork exactly? It'd be better to expand the conversation on gitter if you like.

Wow, thanks for the super fast reply. Sure, we can move the discussion to gitter.

@Sadam1195
Copy link
Contributor

Please add DC-TTS to the the list of models.

DC-TTS implementation available with MIT Licence code available here
EFFICIENTLY TRAINABLE TEXT-TO-SPEECH SYSTEM BASED ON DEEP CONVOLUTIONAL NETWORKS WITH GUIDED ATTENTION paper
@erogol

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Jul 4, 2021
@coqui-ai coqui-ai deleted a comment from stale bot Jul 5, 2021
@stale stale bot removed the wontfix This will not be worked on but feel free to help. label Jul 5, 2021
@will-rice
Copy link

What were you thinking about the "TensorFlow run-time for training models"? Like giving the user the option of using TensorFlow or PyTorch? I wouldn't mind taking a stab at the TensorFlow part.

@erogol
Copy link
Member Author

erogol commented Aug 23, 2021

@will-rice the plan is to mirror what we have in torch to TF as much as possible. It'd be great if you initiate the work

@lucashueda
Copy link

Are you guys planning to develop some expressive TTS architectures? I'm currently studying this topic and planning to implement some of them based on Coqui, part of them just controlling latent space using GST Kwon et al 2020 or RE Sorin et al 2020, and others that actually changes the architecture by adding VAE, normalizing flows and gradient reversal

@a-froghyar
Copy link
Contributor

@lucashueda Capacitron VAE: #510

@lucashueda
Copy link

@lucashueda Capacitron VAE: #510

Oh nice, hope to see Capacitron integrated soon. So maybe, in the future I'll be able to contribute with some others expressive architectures

@BillyBobQuebec
Copy link

BillyBobQuebec commented Sep 18, 2021

@erogol Look forward to new End-to-End models being implemented, specfically Efficient-TTS! if the paper is accurate, it should blow most 2 stage configurations out of the water, considering it seems to have higher MOS than tacotron2+hifigan, while also seeming to have significantly faster speed than glowtts+fastest vocoder! I have not seen a single repo replicating the EFTS-Wav architecture described in the paper released 10 months ago, it would be amazing to see it in Coqui first!

@erogol
Copy link
Member Author

erogol commented Sep 18, 2021

@BillyBobQuebec I don't think I will implement these models anytime soon. But as they stand, contributions are welcome

@WeberJulian
Copy link
Contributor

@BillyBobQuebec but you can try VITS which is close to what you're describing :)

@BillyBobQuebec
Copy link

@BillyBobQuebec but you can try VITS which is close to what you're describing :)

Agreed, I am currently trying VITS actually, I have some issues training with the coqui implementation unfortunately, I've posted the issue about the bug today and hope I can get it resolved.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Oct 30, 2021
@coqui-ai coqui-ai deleted a comment from stale bot Nov 1, 2021
@stale stale bot removed the wontfix This will not be worked on but feel free to help. label Nov 1, 2021
@stale stale bot added the wontfix This will not be worked on but feel free to help. label Dec 30, 2021
@coqui-ai coqui-ai deleted a comment from stale bot Jan 1, 2022
@stale stale bot removed the wontfix This will not be worked on but feel free to help. label Jan 1, 2022
@hemath1001
Copy link

Hi there! Thanks for your great work! I'm looking forward to training YourTTS on other languages. Will training and fine-tuning code of YourTTS be published soon? I would be very grateful if you could tell me an approximate time~ Have a nice day :-D

@erogol erogol removed the wontfix This will not be worked on but feel free to help. label Dec 5, 2022
@erogol erogol reopened this Dec 5, 2022
@nfaraji2002
Copy link

Hi
thanks for delightful codes!
I want to use this version of TTS on raspberry pi 4, but I think this version does not support real time processing.
Are there TF utilities provided as in Mozilla TTS to convert trained models to tf-lite?
Can the strategy of quantization work here for real-time processing?
I need some roadmaps in this regard.

Thanks
Neda

@jhj0517
Copy link

jhj0517 commented Jan 16, 2023

Thank you for your great work for TTS.

Is there any progress on Let the user pass a custom text cleaner function. ?
If it's possible, I want to pass my own Korean cleaners.

@erogol
Copy link
Member Author

erogol commented Jan 16, 2023

You can currently do it by creating your own tokenizer or overloading the class.

@stale
Copy link

stale bot commented Feb 17, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Feb 17, 2023
@MaxIakovliev
Copy link

Marvelous project.
Any ways to donate to core contributors?
I would prefer to use paypal.

@stale stale bot removed the wontfix This will not be worked on but feel free to help. label Feb 20, 2023
@stale
Copy link

stale bot commented Mar 22, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Mar 22, 2023
@erogol
Copy link
Member Author

erogol commented Mar 23, 2023

@MaxIakovliev you can use https://coqui.ai/ :)

@stale stale bot removed the wontfix This will not be worked on but feel free to help. label Mar 23, 2023
@erogol
Copy link
Member Author

erogol commented Mar 23, 2023

This roadmap issue is quite outdated. I'll keep it open to keep the references to some of the issues and models we like to tackle but won't be updating until one day officially becomes 48 hours.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Apr 22, 2023
@coqui-ai coqui-ai deleted a comment from stale bot Apr 23, 2023
@stale stale bot removed the wontfix This will not be worked on but feel free to help. label Apr 23, 2023
@jmlcoliveira
Copy link

Any update regarding SSML implementation?

@erogol
Copy link
Member Author

erogol commented May 11, 2023

We are not working on SSML currently, it is back in the list without a precise timeline.

@offside609
Copy link

Please do!!

@stale
Copy link

stale bot commented Jun 17, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Jun 17, 2023
@stale stale bot closed this as completed Jun 25, 2023
@violet17
Copy link

Will you support bark-small? Thanks.

@csukuangfj
Copy link

Any plan to a port of coqui-ai engine for android? TTS on android is very robotic (espeak, rhvoice, festival lite).

@paolo-caroni

Please take a look at
#3194

You can use sherpa-onnx to run VITS models from Coqui on Android and also embedded devices, e.g., raspberry pi.

We have pre-built Android APKs for the VITS English models from Coqui.
https://k2-fsa.github.io/sherpa/onnx/tts/apk.html

image

@DmitryVN
Copy link

DmitryVN commented Nov 21, 2023

Fix it plz #3039 #3282
The problem persists and because of this, normal correct use is not possible. Also at the moment it kind of breaks off the phrase at the end of each sentence and it turns out a jerky reading.

@MarkChrisE2091
Copy link

Any new update?

@csukuangfj
Copy link

Any plan to a port of coqui-ai engine for android? TTS on android is very robotic (espeak, rhvoice, festival lite).

@paolo-caroni

We have supported it in k2-fsa/sherpa-onnx#508

The following is a YouTube video
https://www.youtube.com/watch?v=33QYuVzDORA

You can use all coqui-ai/TTS models and piper models listed in
https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
with k2-fsa/sherpa-onnx#508

@imevro
Copy link

imevro commented Apr 29, 2024

hi guys, why?

upd: found https://twitter.com/_josh_meyer_/status/1742522906041635166

Screenshot 2024-04-29 at 19 19 25

@NicoleKai
Copy link

NicoleKai commented Apr 29, 2024

Their ability to exist and be profitable was dependent on how much better their tech was compared to everyone else. It may not feel like it, but we are in the middle of an AI singularity. Coqui's business model might have stood a chance if they started with this tech 5 years earlier, but it was probably too little too late. Eleven labs is probably eating their lunch :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TODOs wontfix This will not be worked on but feel free to help.
Projects
None yet
Development

No branches or pull requests