Stable Diffusion for studies

This is yet another Stable Diffusion compilation, aimed to be functional, clean & compact enough for various experiments. There's no GUI here, as the target audience are creative coders rather than post-Photoshop users. For the latter one may check InvokeAI or AUTOMATIC1111 as a convenient production tool, or Deforum for precisely controlled animations.

The code is based on the CompVis and Stability AI libraries and heavily borrows from this repo, with occasional additions from InvokeAI and Deforum, as well as the others mentioned below. The following codebases are partially included here (to ensure compatibility and the ease of setup): k-diffusion, Taming Transformers, OpenCLIP, CLIPseg. There is also a similar repo, based on the [diffusers] library, which is more logical and up-to-date.

Current functions:

Text to image
Image re- and in-painting
Latent interpolations (with text prompts and images)

Fine-tuning with your images:

Add subject (new token) with textual inversion
Add subject (prompt embedding + Unet delta) with custom diffusion

Other features:

Memory efficient with xformers (hi res on 6gb VRAM GPU)
Use of special depth/inpainting and v2 models
Masking with text via CLIPseg
Weighted multi-prompts
to be continued..

More details and Colab version will follow.

Setup

Install CUDA 11.6. Setup the Conda environment:

conda create -n SD python=3.10 numpy pillow 
activate SD
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
pip install -r requirements.txt

Install xformers library to increase performance. It makes possible to run SD in any resolution on the lower grade hardware (e.g. videocards with 6gb VRAM). If you're on Windows, first ensure that you have Visual Studio 2019 installed.

pip install git+https://github.com/facebookresearch/xformers.git

Download Stable Diffusion (1.5, 1.5-inpaint, 2-inpaint, 2-depth, 2.1, 2.1-v, OpenCLIP, custom VAE, CLIPseg, MiDaS models (mostly converted to float16 for faster loading) by the command below. Licensing info is available on their webpages.

python download.py

Operations

Examples of usage:

Generate an image from the text prompt:

python src/_sdrun.py -t "hello world" --size 1024-576

Redraw an image with existing style embedding:

python src/_sdrun.py -im _in/something.jpg -t "<line-art>"

Redraw directory of images, keeping the basic forms intact:

python src/_sdrun.py -im _in/pix -t "neon light glow" --model v2d

Inpaint directory of images with RunwayML model, turning humans into robots:

python src/_sdrun.py -im _in/pix --mask "human, person" -t "steampunk robot" --model 15i

Make a video, interpolating between the lines of the text file:

python src/latwalk.py -t yourfile.txt --size 1024-576

Same, with drawing over a masked image:

python src/latwalk.py -t yourfile.txt -im _in/pix/bench2.jpg --mask _in/pix/mask/bench2_mask.jpg

Check other options by running these scripts with --help option; try various models, samplers, noisers, etc.
Text prompts may include either special tokens (e.g. <depthmap>) or weights (like good prompt :1 | also good prompt :1 | bad prompt :-0.5). The latter may degrade overall accuracy though.
Interpolated videos may be further smoothed out with FILM.

There are also Windows bat-files, slightly simplifying and automating the commands.

Fine-tuning

Train prompt embedding for a specific subject (e.g. cat) with textual inversion:

python src/train.py --token mycat1 --term cat --data data/mycat1

Do the same with custom diffusion:

python src/train.py --token mycat1 --term cat --data data/mycat1 --reg_data data/cat

Results of the trainings above will be saved under train directory.

Custom diffusion trains faster and can achieve impressive reproduction quality in the simple and similar prompts, but it can entirely lose the point if the prompt is too complex or aside from the original category. Result file is 73mb (can be compressed to ~16mb). Note that in that case you'll need both target reference images (data/mycat1) and more random images of similar subjects (data/cat). Apparently, you can generate the latter with SD itself.
Textual inversion is more generic but stable. Its embeddings can also be easily combined without additional retraining. Result file is ~5kb.

Generate image with embedding from textual inversion. You'll need to rename the embedding file as your trained token (e.g. mycat1.pt), and point the path to its directory. Note that the token is hardcoded in the file, so you can't change it afterwards.

python src/_sdrun.py -t "cosmic <mycat1> beast" --embeds train

Generate image with embedding from custom diffusion. You'll need to explicitly mention your new token (so you can name it differently here) and path to the trained delta file:

python src/_sdrun.py -t "cosmic <mycat1> beast" --token_mod mycat1 --delta_ckpt train/delta-xxx.ckpt

You can also run python src/latwalk.py ... with such arguments to make animations.

Credits

It's quite hard to mention all those who made the current revolution in visual creativity possible. Check the inline links above for some of the sources. Huge respect to the people behind Stable Diffusion, InvokeAI, Deforum and the whole open-source movement.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
_in		_in
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download.py		download.py
img.bat		img.bat
inpaint.bat		inpaint.bat
model_half.py		model_half.py
requirements.txt		requirements.txt
train.bat		train.bat
txt.bat		txt.bat
walk.bat		walk.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_in

_in

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

download.py

download.py

img.bat

img.bat

inpaint.bat

inpaint.bat

model_half.py

model_half.py

requirements.txt

requirements.txt

train.bat

train.bat

txt.bat

txt.bat

walk.bat

walk.bat

Repository files navigation

Stable Diffusion for studies

Setup

Operations

Fine-tuning

Credits

About

Releases 1

Packages

Languages

License

eps696/SD

Folders and files

Latest commit

History

Repository files navigation

Stable Diffusion for studies

Setup

Operations

Fine-tuning

Credits

About

Resources

License

Stars

Watchers

Forks

Languages