Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement]: Hidiffusion for SD1.5 and SDXL #6309

Open
1 task done
frankyifei opened this issue May 4, 2024 · 4 comments
Open
1 task done

[enhancement]: Hidiffusion for SD1.5 and SDXL #6309

frankyifei opened this issue May 4, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@frankyifei
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Contact Details

No response

What should this feature add?

Hidiffusion is a new a training-free method that increases the resolution and speed of pretrained diffusion models. Its open source code is diffusers based, so it must be fairly easy to add this function. It works fairly well for large resolution such as 2048x2048 or higher and speed up generation quite a lot.
here is an example from Juggernaut reborn 1.5 without any upscale, it also takes less time to generate.
657267986660962_00001_

Alternatives

No response

Additional Content

No response

@frankyifei frankyifei added the enhancement New feature or request label May 4, 2024
@psychedelicious
Copy link
Collaborator

W have a lot of custom logic around diffusers, and the "just add a single line!" doesn't necessarily apply to our implementation.

@RyanJDick @lstein Can you advise on effort to implement this? It would replace the HRO feature (automatic 2nd pass img2img).

@RyanJDick
Copy link
Collaborator

RyanJDick commented May 6, 2024

TLDR: I think HiDiffusion could be supported in a way that is compatible with all of our other features. But, it would definitely be more effort than the one-liner that they advertise. We should do more testing to make sure that this feature is worth the implementation / maintenance effort (the examples in the paper look great).


I spent some time reading the HiDiffusion paper today. Here are my notes on what it would take to implement this:

HiDiffusion modifies the UNet in two ways: RAU-Net (Resolution-Aware U-Net) and MSW-MSA (Modified Shifted Window Multi-head Self-Attention). These are both tuning-free modifications to the UNet i.e. no new weights are needed.

The RAU-Net is intended to avoid subject duplication at high resolutions. It achieves this by changing the downsampling/upsampling pattern of the UNet layers so that the deep layers operate at resolutions closer to what they were trained on.

The MSW-MSA modification improves generation time at high resolution by applying windowing to the self-attention layers of the top UNet blocks.

I think we should be able make these changes in a way that is compatible with most other features, the main question is how much effort it will take.

Compatibility:

  • Regional prompting: I think there are some places where we make assumptions about the UNet downsampling scheme, but those shouldn't be too hard to modify.
  • TI: No changes required.
  • LoRA: No changes required, but HiDiffusion might interfere with the effectiveness of some LoRAs.
  • ControlNet: The HiDiffusion repo includes support for ControlNet in diffusers. We don't use the diffusers ControlNet implementation as-is, so there would probably be a bit of effort to get this working.
  • Custom attention processors (regional prompting and IP-Adapter): Should just work, but some risk of conflict with MSW-MSA that I haven't anticipated.
  • Sequential vs. batched conditioning: No changes required.

@psychedelicious
Copy link
Collaborator

psychedelicious commented May 6, 2024

Is this limited to image sizes greater than the model's trained dimensions, or is the improvement greater at those dimensions (but still present at trained dimensions)?

@RyanJDick
Copy link
Collaborator

Is this limited to image sizes greater than the model's trained dimensions, or is the improvement greater at those dimensions (but still present at trained dimensions)?

MSW-MSA can be applied at native model resolutions to get some speedup. But, the amount of speedup would be much greater at higher resolutions. Based on some of the numbers reported in the paper, I'd guess that we could get a ~20% speedup from SDXL at 1024x1024. I'm not sure if there would be perceptible quality degradation. We'd have to test that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants