Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems adding a new speaker #200

Open
JoanisTriandafilidi opened this issue Dec 18, 2023 · 0 comments
Open

Problems adding a new speaker #200

JoanisTriandafilidi opened this issue Dec 18, 2023 · 0 comments

Comments

@JoanisTriandafilidi
Copy link

JoanisTriandafilidi commented Dec 18, 2023

Hello! Thanks for the great job!
I actively use Vits in various personal mini-projects and I had an idea related to adding new speakers to the multi-speaker model.

The essence of my idea is this:

  1. I trained a good multispeaker model for 200 speakers.
  2. I received an embedding for a new speaker of a suitable format using Speakernet.
  3. I want to add a new speaker to an existing multispeaker model by adding a new embed. That is, emb_g.shape was equal to [200, 192], but will become [201, 192]. I'm adding a new embedding to the utils.load_checkpoint function.

The model loads without problems - however, on the inference, instead of the expected new (!) voice, I get one of the 200 already trained voices. Moreover, if I apply some other embedding to the input, I will get some other voice from these 200. So I can conclude that the model can potentially generate voices for artificially added speakers. But I can't get the voice to match the target.

Could you please tell me how I can solve this problem? Why, when the model sees a new embedding, does it generate a different voice?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant