Skip to content

A tutorial that guides users through the process of fine-tuning a stable diffusion model using HuggingFace's diffusers library. The tutorial includes advice on suitable hardware requirements, data preparation using the BLIP Flowers Dataset and a Python notebook, and detailed instructions for fine-tuning the model.

Notifications You must be signed in to change notification settings

pranavgupta2603/stable-dffusion-finetuning-flowers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 

Repository files navigation

Fine-Tuning a Stable Diffusion Model with HuggingFace's Diffusers Library

In this tutorial, we'll walk through the process of fine-tuning a stable diffusion model using HuggingFace's diffusers library. The stable diffusion model is a generative model that can be used for a variety of tasks, including image synthesis, text generation, and audio generation.

Hardware

Running Stable Diffusion itself is not too demanding by today's standards, and fine tuning the model doesn't require anything like the hardware on which it was originally trained. Make sure your GPU has more than 16GB, and 512GB of storage or it will cause CUDA out of memory errors. You can use LamdaLabs, AWS, or Azure and other services to get Notebook instances on GPU powered servers.

Data

BLIP Flowers Dataset on Hugging Face

If you want to create your own dataset containing text-image pairs, this Github Repository of mine will help you out.

Fine Tuning

Use the Python Notebook in the repository.

The exclamation mark (!) is used to run shell commands or terminal commands directly from within a code cell. When you use the exclamation mark before a command, Jupyter interprets it as a shell command rather than a Python statement.

The code below is present in the Notebook. You can change the hyperparameters in the cell below according to your dataset. This configuration works well with most stable diffusion finetuning problems.

!accelerate launch diffusers/examples/text_to_image/train_text_to_image.py \
  --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" \
  --dataset_name="HUGGINGFACE_DATASET_NAME" \
  --use_ema \
  --resolution=128 --center_crop --random_flip \
  --train_batch_size=8 \
  --gradient_accumulation_steps=2 \
  --gradient_checkpointing \
  --mixed_precision="fp16" \
  --max_train_steps=1000 \
  --learning_rate=1e-05 \
  --max_grad_norm=1 \
  --lr_scheduler="constant" --lr_warmup_steps=0 \
  --output_dir="PATH_TO_SAVE_MODEL" 

Change the --dataset_name="HUGGINGFACE_DATASET_NAME" to a dataset containing text-image pairs on HuggingFace

References

Check out the Hugging Face Stable Diffusion text-to-image fine-tuning documentation.

About

A tutorial that guides users through the process of fine-tuning a stable diffusion model using HuggingFace's diffusers library. The tutorial includes advice on suitable hardware requirements, data preparation using the BLIP Flowers Dataset and a Python notebook, and detailed instructions for fine-tuning the model.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published