Possible Bug in Style Diffusion Inference Code #230

brthor · 2024-04-16T19:36:01Z

Here we see that the ground truth for the denoiser is the (acoustic_styles, prosodic_styles):

But here we see that the output from the sampler is parsed as (prosodic_styles, acoustic_styles)

Also during inference the output from the sampler is parsed as (prosodic_styles, acoustic_styles):

I tried swapping them and got bad inference results, so I'm not sure what is going on with this.

The text was updated successfully, but these errors were encountered:

Provide feedback