New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing the seed does nothing. #86
Comments
Sounds like a bug! Will try to trace it down in a bit, but most likely seed isn't being propagated properly? Temperature - 0 to 1 |
Thanks a lot for the answer. I'm off to bed now, but if you don't fix this by friday, at that point I'll try to follow the variable through the code and see if I find where it's broken. Additional info: it generates the same files, to the bit:
|
I don't think I'll have the time before the weekend either |
@vatsalaggarwal this is the culprit: https://github.com/metavoiceio/metavoice-src/blob/main/fam/llm/fast_inference_utils.py#L334 By the way, I was a bit confused by the docs. It says to use fast_inference.py, but that file doesn't have command line parameters. I see inference.py has command line parameters, but that file isn't mentioned in the docs. What is the difference between the two, and which is supposed to be used ? |
we haven't had the chance to improve this experience yet |
thanks for the info!
would you be interrested in a PR with a few changes to code and docs? a few more questions:
Edit: I think |
yes, that would be great!
that's weird, first time we've heard that... can you send me the link to the online demo you're talking about? it's likely that you ended up coming across a bad seed or that the online demo you're referring to is using a different reference speaker (garbling/artefacts/etc are super sensitive to the reference you use), but let me just double check once i have the link!
I don't how well torch supports compilation on CPU, etc, so I probably wouldn't debug this on CPU. I think the fast version should use slightly more memory than non-fast (they are the exact same model, just difference in overheads). For fitting it on 12GB, I would recommend swapping out the diffusion model for the Vocos model. You could additionally try quantisation for the first stage model.
Think either the |
https://ttsdemo.themetavoice.xyz/ Two issues:
I get better output from that demo than from running it locally, so I'd really like some way to reproduce what that page does.
Consistently bad across 16 tries. I'll do some demo/reproduction later on to show you.
The sample is pretty clean, and works better in the online demo / with xtts (https://huggingface.co/spaces/coqui/xtts).
Is that trivial, or more involved? I'm a user of AI stuff, not really a designer of it. Too few PHDs (ie zero).
Same question. Is there something, anything, I can do using the command line parameters, or some simple modification of the code, that would result in lower VRAM usage? Edit: I'm playing around with the value of --guidance_scale on fast_inference.py to understand the min/max values, and at 4.0 it works, but at 8.0 I run out of VRAM. Does that mean this is something I could play with to get inference.py to fit in the VRAM?
In my experience with other projects it's always been fine. I was able to fully run inference.py on CPU with compilation enabled and with it disabled, so your code/project is better than you presumed :) Took 25 minutes though. Note, even with
On my machine, the fast version runs fine on the GPU, the slow version doesn't. Is there some way to know how much each "requires"/needs ? I'd like to know what (minimum) size GPU I need to buy for this to work.
I was able to get it to work, on cpu, by setting PS: Question: the docs say 45-90 seconds for the sample. Mine is 50 seconds. Should I expect better results with a 90 seconds one? Would a 5 minute sample do even better (even if by a little) than a 90 second one? PS: Some data on trying to find the bounds of the parameters.
using whisper to get text from the generated sound, and scoring based on the levensthein distance between the intended and understood. |
Yep, we've scaled the top_p and guidance_scale parameters to make them more convenient. See https://github.com/metavoiceio/metavoice-src/blob/main/app.py#L29-L36 for reference.
Just to double check, did a new seed get used on each try? It's possible each try ended up using the same seed...
this would be super helpful! re the rest:
|
(Amazing work!)
It generates the exact same wav file no matter what seed I input, no variation (that I can notice).
Any idea why?
I'm using fast_inference.py, just modified enough to take command line parameters (including the seed):
thanks for any help/ideas.
Also, what is the range for parameters like temperature, top-p, and guidance scale ?
The text was updated successfully, but these errors were encountered: