Replies: 1 comment 1 reply
-
Hi, what's not clear in the code comment? This was added with the It seems your question is more related to the difference in architecture of each model and probably for that it will be better if you read the paper of each model and understand them. cc @sayakpaul if there's something else to add. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Both stable diffusion 2.1 and XL use the second-last output of the text encoder to compute cross-attention in the unet.
But why in pipeline stable diffusion, it uses the last output of the text encoder and has an additional layer norm operation
diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py
Lines 392 to 407 in 70f8d4b
which pipeline stable diffusion xl hasn't
diffusers/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py
Lines 403 to 407 in 70f8d4b
Beta Was this translation helpful? Give feedback.
All reactions