Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request - add multiple speakers to repository #83

Open
BBC-Esq opened this issue Feb 5, 2024 · 2 comments
Open

Feature request - add multiple speakers to repository #83

BBC-Esq opened this issue Feb 5, 2024 · 2 comments

Comments

@BBC-Esq
Copy link
Contributor

BBC-Esq commented Feb 5, 2024

The following website has a bunch of voices for Bark:

https://rsxdalv.github.io/bark-speaker-directory/

I was wondering if anyone had an interest in doing something similar for WhisperSpeech? Currently, to use anything except the default voice one has to obtain an audio file and properly add a parameter within custom code to extract the embeddings...then the voice is used.

The pipeline.py script currently hardcodes the default voice here:

image

Perhaps we can obtain multiple tensors of high quality voices and offer them as options for people, male, female, etc.? I'm willing to contribute but still haven't been able to accurately extract speaker embeddings and get the tensors...spent about 3 hours trying different ways.

Let's say we get a dozen high quality voices (i.e. tensors), perhaps include them in a configuration file or constants.py and allow people to choose among them - not removing the ability to create your own of course!

People could even post their voices in the tensor format in the "examples" folder, just brainstorming.

@jpc
Copy link
Contributor

jpc commented Feb 13, 2024

This is actually quite easy to add – one needs to run a voice sample through the speechbrain model (example code is in pipeline.py) and copy the resulting weights to a file.

If we want to add some voices be default we could probably save all the vectors to huggingface in a single pth file (instead of pasting them into the source code). The tricky part is to find reference voices that are properly licensed. Maybe use a few samples from LibriTTS-R?

@BBC-Esq
Copy link
Contributor Author

BBC-Esq commented Feb 13, 2024

Yep, that was my only concern, the licensing issue. One idea would be to use a file named constants.py and just keep adding voices that we've verified as high quality and there's no licensing issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants