Skip to content

Improve Stable Diffusion Prompt Following & Image Quality Significantly With Incantations Extension

Furkan Gözükara edited this page Apr 23, 2024 · 2 revisions

image Hits

Patreon BuyMeACoffee Furkan Gözükara Medium Codio Furkan Gözükara Medium

YouTube Channel Furkan Gözükara LinkedIn Udemy Twitter Follow Furkan Gözükara

Improve Stable Diffusion Prompt Following & Image Quality Significantly With Incantations Extension

image

Stable Diffusion Incantations extension is the newest improvement to the text to image generation pipeline. With Incantations extension by using Perturbed Attention Guidance, Multi-Concept T2I-Zero and Seek for Incantations you can improve your image generation quality significantly. By utilizing these newest SOTA algorithms, Stable Diffusion SD 1.5 and SDXL models can follow your prompts significantly better. In this tutorial, I will show you how to install Incantations extension on Stable Diffusion Automatic1111 SD Web UI and utilize Perturbed Attention Guidance, Multi-Concept T2I-Zero and Seek for Incantations algorithms.

Latest Stable Diffusion Automatic1111 Web UI Installer ⤵️ https://www.patreon.com/posts/86307255

Automatic1111 SD Web UI Incantations Extension ⤵️ https://github.com/v0xie/sd-webui-incantations

Forge Extension & ComfyUI Basic Node ⤵️ https://github.com/pamparamm/sd-perturbed-attention

How To Install Python, GIT & Automatic1111 Manually ⤵️ https://youtu.be/-NjNy7afOQ0

OneTrainer Full Fine Tuning / DreamBooth Tutorial ⤵️ https://youtu.be/0t5l6CP9eBg

After Detailer Extension ⤵️ https://github.com/Bing-su/adetailer

CivitAI My Profile To Find Prompts ⤵️ https://civitai.com/user/SECourses/images

CivitAI My Profile To Find Prompts ⤵️ https://civitai.com/user/SECourses/posts

CivitAI Excellent Tutorials ⤵️ https://civitai.com/user/SECourses/articles

Perturbed Attention Guidance: https://arxiv.org/abs/2403.17377 An alternative/complementary method to CFG (Classifier-Free Guidance) that increases sampling quality.

Controls PAG Scale: Controls the intensity of effect of PAG on the generated image.

Also check out the paper authors' official project page: https://ku-cvlab.github.io/Perturbed-Attention-Guidance/

Multi-Concept T2I-Zero / Attention Regulation:

Implements Corrections by Similarities and Cross-Token Non-Maximum Suppression from https://arxiv.org/abs/2310.07419

Also implements some methods from "Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models" https://arxiv.org/abs/2403.06381

Corrections by Similarities Reduces the contribution of tokens on far away or conceptually unrelated tokens.

Cross-Token Non-Maximum Suppression Attempts to reduces the mixing of features of unrelated concepts.

Controls: Step End: After this step, the effect of both CbS and CTNMS ends. Correction by Similarities Window Size: The number of adjacent tokens on both sides that can influence each token CbS Score Threshold: Tokens with similarity below this threshold have their effect reduced CbS Correction Strength: How much the Correction by Similarities effects the image. Alpha for Cross-Token Non-Maximum Suppression: Controls how much effect the attention maps of CTNMS affects the image. EMA Smoothing Factor: Smooths the results based on the average of the results of the previous steps. 0 is disabled. Known Issues: Can error out with image dimensions which are not a multiple of 64

Also check out the paper authors' official project pages: https://multi-concept-t2i-zero.github.io/ https://github.com/YaNgZhAnG-V5/attention_regulation

Seek for Incantations:

An incomplete implementation of a "prompt-upsampling" method from https://arxiv.org/abs/2401.06345 Generates an image following the prompt, then uses CLIP text/image similarity to add on to the prompt and generate a new image.

Controls: Append Generated Caption: If true, will append an additional interrogated caption to the prompt. For Deepbooru Interrogate, recommend disabling. Deepbooru Interrogate: Uses Deepbooru to interrogate instead of CLIP. Delimiter: The word to separate the original prompt and the generated prompt. Recommend trying BREAK, AND, NOT, etc. Word Replacement: The word/token to replace dissimilar words with. Gamma: Replaces words below this level of similarity with the Word Replacement. For example, if your prompt is "a blue dog", delimiter is "BREAK", and word replacement is "-", and the level of similarity of the word "blue" in the generated image is below gamma, then the new prompt will be "a blue dog BREAK a - dog"

A WIP implementation of the "prompt optimization" methods are available in branch "s4a-dev2"

Credits The authors of the papers for their methods:

Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering Chang Yu and Junran Peng and Xiangyu Zhu and Zhaoxiang Zhang and Qi Tian and Zhen Lei

Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else Hazarapet Tunanyan and Dejia Xu and Shant Navasardyan and Zhangyang Wang and Humphrey Shi

Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance Donghoon Ahn and Hyoungwon Cho and Jaewon Min and Wooseok Jang and Jungwoo Kim and SeonHwa Kim and Hyun Hee Park and Kyong Hwan Jin and Seungryong Kim