Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate text-to-speech and speech-to-text functionality #44

Open
amakropoulos opened this issue Jan 22, 2024 · 9 comments
Open

Integrate text-to-speech and speech-to-text functionality #44

amakropoulos opened this issue Jan 22, 2024 · 9 comments
Labels
enhancement New feature or request

Comments

@amakropoulos
Copy link
Collaborator

No description provided.

@amakropoulos amakropoulos changed the title Integrate text-to-speech and speech-to-text functionalities Integrate text-to-speech and speech-to-text functionalities Jan 22, 2024
@amakropoulos amakropoulos changed the title Integrate text-to-speech and speech-to-text functionalities Integrate text-to-speech and speech-to-text functionality Jan 22, 2024
@amakropoulos amakropoulos added the enhancement New feature or request label Jan 22, 2024
@ArEnSc
Copy link

ArEnSc commented Feb 10, 2024

please make this an optional package that is separate

@amakropoulos
Copy link
Collaborator Author

Yes certainly, it will be possible to attach STT or TTS to the chat functionality but it will not be enabled by default.

@amakropoulos amakropoulos added this to the v1.2.0 milestone Feb 15, 2024
@simoninithomas
Copy link

Hey there 👋 , will you use Sentis for STT and TTS? Or do you have another idea?

We have some Sentis model on the Hub that are super fast (Tiny Whisper and Jets).

Tiny Whisper: https://huggingface.co/unity/sentis-whisper-tiny
Jets: https://huggingface.co/unity/sentis-jets-text-to-speech

Demo with Whisper: https://singularite.itch.io/jammo-the-robot-with-unity-sentis-whisper-version

@amakropoulos amakropoulos modified the milestones: v1.2.0, v1.3.0 Mar 4, 2024
@amakropoulos
Copy link
Collaborator Author

Hi, thank you for the suggestions!
I need to do a small exploration first, but yes I was thinking to start with your Whisper-Tiny model 🙂.
Ideally I would like to support a range of models e.g. similarly to whisper.cpp project but need to have it working cross-platform in Unity which is work-in-progress (link).

By the way, thanks a lot for your great work on the sharp-transformers ⭐!
I'm using it in the other repo, RAGSearchUnity, to build a RAG similarity search system!

@siddhant-bharti
Copy link

Hi @amakropoulos : I want this functionality for a project I am building! Are you planning to add this soon? I can help raise a PR for this functionality too if you are fine with this? Looking forward to hearing from you. Thanks!

@amakropoulos
Copy link
Collaborator Author

amakropoulos commented Mar 22, 2024

@siddhant-bharti I'm replying here as well :).
This is the next big feature that I'll work on soon.

@simoninithomas I can't use Jets because it has a cc-by-4.0 license.
The Unity Asset store does not allow packages with licenses that require attribution and I'd like LLM for Unity to be there as well (p.s. we are live on asset store as of last week 🎉 !)

@amakropoulos amakropoulos removed this from the v1.3.0 milestone Apr 4, 2024
@amakropoulos
Copy link
Collaborator Author

This feature is blocked at the moment.
I can't find an open-source library for TTS to integrate that fulfills the following requirements:

  • C/C++/C# code without many dependencies
  • MIT/Apache 2.0 or any other equivalent license that is open-source and attribution-free
  • allow multiple voices

The best solution would be Piper but at the moment has a potential license issue due to to using espeak (link).

@Pipsun
Copy link

Pipsun commented Apr 10, 2024

This feature is blocked at the moment. I can't find an open-source library for TTS to integrate that fulfills the following requirements:

  • C/C++/C# code without many dependencies
  • MIT/Apache 2.0 or any other equivalent license that is open-source and attribution-free
  • allow multiple voices

The best solution would be Piper but at the moment has a potential license issue due to to using espeak (link).

Hello, i've made integration of your project with openCV for facetracking, vroid as avatar, vosk stt and piper tts, but i think that the most interesting is integration with rvc, but have no time for this. Maybe you know something about ready to use RVC Unity integrations?

@Swiftyos
Copy link

Adding TTS and STT functionality would take llamafile to the next level!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Blocked
Development

No branches or pull requests

6 participants