Diffusion deployment

This repository contains scripts for the deployment of diffusion models (based on diffusers) on both GPU (Nvidia) and CPU (Intel). The aim is to significantly speed up the inference of diffusion models. It provides a ~12x speedup on CPUs and a ~4x speedup on GPUs. Integrated with small-stable-diffusion-v0, it could generate an image in just 5s on the CPU.

CPU speedup

We develop the diffusion deployment on CPU based on Intel OpenVINO. The pipeline OpenVINOStableDiffusionPipeline is modified from OnnxStableDiffusionPipeline. The code used here is in pipeline_openvino_stable_diffusion.py.

Results

Here are some experimental results on stable diffusion v1.4 for comparison with the default Pytorch CPU and Onnx pipeline.

Pipeline	Pytorch CPU	Onnx	OpenVINO
Time Cost	397s	77s ± 2.56 s	33.9 s ± 247 ms
Speedup	1	5.2	11.7

Test setting: CPU Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz / PNDM scheduler 50 steps)

Prerequisites

There are several limitations of OpenVINO now, Therefore, we only support the following platforms for CPU speedup.

Ubuntu 18.04, 20.04, RHEL(CPU only) or Windows 10 - 64 bit
Python 3.7, 3.8 or 3.9 for Linux and only Python3.9 for Windows

Requirements

diffuers
transformers
openvino runtime

To install openvino runtime, you could simply use pip install onnxruntime-openvino==1.13.0.

Usage

To use this deployment, you could follow the following code:

# Load a onnx pipeline firstly.  
from diffusers import OnnxStableDiffusionPipeline
onnx_pipe = OnnxStableDiffusionPipeline.from_pretrained(
    "OFA-Sys/small-stable-diffusion-v0",
    revision="onnx",
    provider="CPUExecutionProvider",
)
# Convert it to OpenVINO pipeline.  
import pipeline_openvino_stable_diffusion
openvino_pipe = pipeline_openvino_stable_diffusion.OpenVINOStableDiffusionPipeline.from_onnx_pipeline(onnx_pipe)

# Generate images.
images = openvino_pipe("an apple, 4k")

GPU speedup

We develop the deployment on GPU based on TensorRT and its plugins.

Comparison

Here are some experimental results of stable-diffusion-v1-4 and small-stable-diffusion-v0.

Model\Pipeline	Pytorch GPU	TensorRT	TensorRT Plugin
Stable diffusion	3.94	1.44s	1.07s
Small stable diffusion	2.7s	1.01s	0.65s

Prerequisites

See the gpu_requirements.txt for requirments. To use plugins, we need tensorrt>=8.5. If you use tensorrt==8.4, you could run it by deleting trt.init_libnvinfer_plugins(TRT_LOGGER, '') in gpu-trt-infer-demo.py and not adding PLUGIN_LIBS to LD_PRELOAD.

Usage

export PLUGIN_LIBS="/path/to/libnvinfer_plugin.so.8.5.1"
export HF_TOKEN="Your_HF_TOKEN"

mkdir -p onnx engine output
LD_PRELOAD=${PLUGIN_LIBS} python3 demo-diffusion.py "a beautiful photograph of Mt. Fuji during cherry blossom" --enable-preview-features --hf-token=$HF_TOKEN -v

Contributions

Contributions to this repository are welcome. If you would like to contribute, please open a pull request and make sure to follow the existing code style.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cpu_requirements.txt		cpu_requirements.txt
gpu-trt-infer-demo.py		gpu-trt-infer-demo.py
gpu_requirements.txt		gpu_requirements.txt
libnvinfer_plugin.so.8.5.1		libnvinfer_plugin.so.8.5.1
pipeline_openvino_stable_diffusion.py		pipeline_openvino_stable_diffusion.py
utilities.py		utilities.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

cpu_requirements.txt

cpu_requirements.txt

gpu-trt-infer-demo.py

gpu-trt-infer-demo.py

gpu_requirements.txt

gpu_requirements.txt

libnvinfer_plugin.so.8.5.1

libnvinfer_plugin.so.8.5.1

pipeline_openvino_stable_diffusion.py

pipeline_openvino_stable_diffusion.py

utilities.py

utilities.py

Repository files navigation

Diffusion deployment

CPU speedup

Results

Prerequisites

Usage

GPU speedup

Comparison

Prerequisites

Usage

Contributions

About

Releases

Packages

Languages

License

OFA-Sys/diffusion-deploy

Folders and files

Latest commit

History

Repository files navigation

Diffusion deployment

CPU speedup

Results

Prerequisites

Usage

GPU speedup

Comparison

Prerequisites

Usage

Contributions

About

Resources

License

Stars

Watchers

Forks

Languages