vid2vid-zero for Zero-Shot Video Editing

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

Wen Wang^1*, Kangyang Xie^1*, Zide Liu^1*, Hao Chen¹, Yue Cao², Xinlong Wang², Chunhua Shen¹

We propose vid2vid-zero, a simple yet effective method for zero-shot video editing. Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video. At the core of our method is a null-text inversion module for text-to-video alignment, a cross-frame modeling module for temporal consistency, and a spatial regularization module for fidelity to the original video. Without any training, we leverage the dynamic nature of the attention mechanism to enable bi-directional temporal modeling at test time. Experiments and analyses show promising results in editing attributes, subjects, places, etc., in real-world videos.

Highlights

Video editing with off-the-shelf image diffusion models.
No training on any video.
Promising results in editing attributes, subjects, places, etc., in real-world videos.

News

[2023.4.12] Online Gradio Demo is available here.
[2023.4.11] Add Gradio Demo (runs in local).
[2023.4.9] Code released!

Installation

Requirements

pip install -r requirements.txt

Installing xformers is highly recommended for improved efficiency and speed on GPUs.

Weights

[Stable Diffusion] Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. The pre-trained Stable Diffusion models can be downloaded from 🤗 Hugging Face (e.g., Stable Diffusion v1-4, v2-1). We use Stable Diffusion v1-4 by default.

Zero-shot testing

Simply run:

accelerate launch test_vid2vid_zero.py --config path/to/config

For example:

accelerate launch test_vid2vid_zero.py --config configs/car-moving.yaml

Gradio Demo

Launch the local demo built with gradio:

python app.py

Or you can use our online gradio demo here.

Note that we disable Null-text Inversion and enable fp16 for faster demo response.

Examples

Input Video	Output Video	Input Video	Output Video
"A car is moving on the road"	"A Porsche car is moving on the desert"	"A car is moving on the road"	"A jeep car is moving on the snow"

"A man is running"	"Stephen Curry is running in Time Square"	"A man is running"	"A man is running in New York City"

"A child is riding a bike on the road"	"a child is riding a bike on the flooded road"	"A child is riding a bike on the road"	"a lego child is riding a bike on the road.gif"

"A car is moving on the road"	"A car is moving on the snow"	"A car is moving on the road"	"A jeep car is moving on the desert"

Citation

@article{vid2vid-zero,
  title={Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models},
  author={Wang, Wen and Xie, kangyang and Liu, Zide and Chen, Hao and Cao, Yue and Wang, Xinlong and Shen, Chunhua},
  journal={arXiv preprint arXiv:2303.17599},
  year={2023}
}

Acknowledgement

Tune-A-Video, diffusers, prompt-to-prompt.

Contact

We are hiring at all levels at BAAI Vision Team, including full-time researchers, engineers and interns. If you are interested in working with us on foundation model, visual perception and multimodal learning, please contact Xinlong Wang ([email protected]) and Yue Cao ([email protected]).

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
configs		configs
data		data
docs		docs
examples		examples
gradio_demo		gradio_demo
vid2vid_zero		vid2vid_zero
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
test_vid2vid_zero.py		test_vid2vid_zero.py

baaivision/vid2vid-zero

Folders and files

Latest commit

History

Repository files navigation

vid2vid-zero for Zero-Shot Video Editing

Highlights

News

Installation

Requirements

Weights

Zero-shot testing

Gradio Demo

Examples

Citation

Acknowledgement

Contact

About

Resources

Stars

Watchers

Forks

Languages