- This is a naive adaption of DreamBooth_LoRA by Hugging Face🤗 with the following modifications:
- Structured code: We re-structured the original code into different modules, including
models
,datasets
,engines
,tools
,utils
, to make it more readable and maintainable, and can be easily extended to other tasks. - Detailed comments: We added detailed comments to the code to make it easier to understand
- Structured code: We re-structured the original code into different modules, including
- Install dependencies
conda create -n dreambooth python=3.8 conda activate dreambooth # install pytorch pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113 # install diffusers from source pip install git+https://github.com/huggingface/diffusers pip install -r requirements.txt
- This will train the model with default settings, including 512x512 resolution, 8GB GPU memory occupied, 1 image per batch, 1 gradient accumulation step, 2e-4 learning rate, 150 training steps, 4 validation epochs. We find this setting is enough to generate high-quality images.
Step1
: Prepare your custom images and put them in a folder. Normally, 5 to 10 images are enough. We recommend you to mannuallly crop the images to the same size, e.g., 512x512, to avoid unwanted artifacts.Step2
: Initialize a Accelerate environment. Accelerate is a PyTorch library that simplifies the process of launching multi-GPU training and evaluation jobs. It is developed by Hugging Face.accelerate config
Step3
: Run the training script. Both checkpoints and samples will be saved in thework_dirs
folder. Normally, it only takes 1-2 minutes to fine-tune the model with only 8GB GPU memoroccupied. 150 epochs are enough to train a object, however, when training on human face, we recommend you to train for 800 epochs. The hyper-parameters of Dreambooth is quite sensitive, you can refer to the original blog for some insights.accelerate launch main.py \ --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \ --instance_data_dir="imgs/dogs" \ --instance_prompt="a photo of sks dog" \ --validation_prompt="a photo of sks dog is swimming" \ --with_prior_preservation \ --class_prompt=='a photo of dog' \ --resolution=512 \ --train_batch_size=1 \ --gradient_accumulation_steps=1 \ --learning_rate=2e-4 \ --max_train_steps=150 \ --validation_epochs 4
Prior preservation is used to avoid overfitting and language-drift (check out the paper to learn more if you’re interested). For prior preservation, you use other images of the same class as part of the training process. The nice thing is that you can generate those images using the Stable Diffusion model itself! The training script will save the generated images to a local path you specify.
accelerate launch main.py \
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
--instance_data_dir="imgs/dogs" \
--instance_prompt="a photo of sks dog" \
--validation_prompt="a photo of sks dog is swimming" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--learning_rate=2e-4 \
--max_train_steps=150 \
--validation_epochs 10
You can aslo fine-tune the text encoder (CLIP) with LoRA. However we find this leads to unconverged results. This phenomenon is opposite to the results reported in the Original Implementation
accelerate launch main.py \
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
--instance_data_dir="imgs/dogs" \
--instance_prompt="a photo of sks dog" \
--validation_prompt="a photo of sks dog is swimming" \
--with_prior_preservation \
--train_text_encoder \
--class_prompt=='a photo of dog' \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--learning_rate=2e-4 \
--max_train_steps=150 \
--validation_epochs 4
After training, you can use the following command to generate images from a prompt. We also provide a pretrained checkpoint for dog (in the example)
wget https://github.com/Mountchicken/Structured_Dreambooth_LoRA/releases/download/checkpoint_dog/checkpoint-200.zip
unzip -q checkpoint-200.zip
accelerate launch main.py \
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
--checkpoint_dir="checkpoint-200" \
--prompt="A photo of sks dog is swimming \
--output_dir=$OUTPUT_DIR