Skip to content

AlonzoLeeeooo/LCDG

Repository files navigation

Late-Constraint Diffusion Guidance for Controllable Image Synthesis

The official code implementation of "Late-Constraint Diffusion Guidance for Controllable Image Synthesis".

[Paper] / [Project] / [Model Weights] / [Demo]

News

  • We have uploaded the training and testing code of LCDG. Afterwards, we would also release our pre-trained model weights as well as an interactive demo. Star the project to get notified!

To-Do Lists

🔥 We are working on extending our work to broader application scenarios, e.g., other diffusion checkpoints or architectures. This GitHub repo will be heavily updated as soon as our work is ready. Star the project to get notified! 🌟 There are some TODOs:

  • Upload a newer version of paper to arXiv
  • Update the codebase
  • Update the repo document
  • Update a HuggingFace demo

Overview

tissor Diffusion models, either with or without text condition, have demonstrated impressive capability in synthesizing photorealistic images given a few or even no words. These models may not fully satisfy user need, as normal users or artists intend to control the synthesized images with specific guidance, like overall layout, color, structure, object shape, and so on. To adapt diffusion models for controllable image synthesis, several methods have been proposed to incorporate the required conditions as regularization upon the intermediate features of the diffusion denoising network. These methods, known as early-constraint ones in this paper, have difficulties in handling multiple conditions with a single solution. They intend to train separate models for each specific condition, which require much training cost and result in non-generalizable solutions. To address these difficulties, we propose a new approach namely late-constraint: we leave the diffusion networks unchanged, but constrain its output to be aligned with the required conditions. Specifically, we train a lightweight condition adapter to establish the correlation between external conditions and internal representations of diffusion models. During the iterative denoising process, the conditional guidance is sent into corresponding condition adapter to manipulate the sampling process with the established correlation. We further equip the introduced late-constraint strategy with a timestep resampling method and an early stopping technique, which boost the quality of synthesized image meanwhile complying with the guidance. Our method outperforms the existing early-constraint methods and generalizes better to unseen condition.

Prerequisites

We integrate the basic environment to run both of the training and testing code in environment.sh using pip as package manager. Simply running bash environment.sh would get the required packages installed.

Before Training or Testing

1. Prepare Pre-trained Model Weights of Stable Diffusion

Before running the code, pre-trained model weights of the diffusion models and corresponding VQ models should be prepared locally. For v1.4 or CelebA model weights, you could refer to the GitHub repository of Stable Diffusion, and excute their provided scripts to download by running:

bash scripts/download_first_stage.sh
bash scripts/download_models.sh

2. Modify the Configuration Files

We provide example configuration files of edge, color and mask conditions in configs/lcdg. Modify line 5 and line 43 in these configuration files with the corresponding paths of pre-trained model weights.

Training the Condition Adapter

1. Prepare Your Training Data

As is reported in our paper, our condition adapter could be well-trained with 10,000 randomly collected images. You could formulate your data directory in the following structure:

collected_dataset/
├── bdcn_edges
├── captions
└── images

For edge condition, we use bdcn to generate the supervision signals from source images. For mask condition, we use u2net to detect saliency masks as supervision. For color condition, we use simulated color stroke as supervision and have incorparated corresponding code in condition_adaptor_src/condition_adaptor_dataset.py.

2. Modify the Configuration Files

After the training data is ready, you need to modify line 74 to 76 with the corresponding paths of the training data. Additionally, if evaluation is required, you need to modify line 77 to 79 with corresponding paths of the splitted validation data.

3. Starting Training

Now you are ready to go by executing condition_adaptor_train.py, such as:

python condition_adaptor_train.py
    -b /path/to/config/file
    -l /path/to/output/path

Inferencing with Trained Condition Adapter

After hours of training, you could try sampling images with the trained condition adapter. We provide an example execution command as follows:

python sample_single_image.py
    --base /path/to/config/path
    --indir /path/to/target/condition
    --caption "text prompt"
    --outdir /path/to/output/path
    --resume /path/to/condition/adapter/weights

Qualitative Comparison

We demonstrate the qualitative comparisons upon different conditions in the following figures.

Canny Edge
HED Edge
User Sketch
Color Storke
Image Palette
Mask

License

This work is licensed under MIT license. See the LICENSE for details.

Citation

If you find our work enlightening or the codebase is helpful to your work, please cite our paper:

@misc{liu2023lateconstraint,
    title={Late-Constraint Diffusion Guidance for Controllable Image Synthesis}, 
    author={Chang Liu and Dong Liu},
    year={2023},
    eprint={2305.11520},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgements

This codebase is heavily built upon the source code of Stable Diffusion. Thanks to their great implementations!

Stars and Forked

Stargazers repo roster for @AlonzoLeeeooo/LCDG

Forkers repo roster for @AlonzoLeeeooo/LCDG

Star History Chart

About

The official code implementation of "Late-Constraint Diffusion Guidance for Controllable Image Synthesis".

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published