Anything To Image

Generate image from anything with ImageBind's unified latent space and stable-diffusion-2-1-unclip.

TODO: Currently, we only support ImageBind-Huge with 1024 latent space. However, it might be possible to use StableDiffusionImageVariation for 768 latent space.

No training is need.
Integration with 🤗 Diffusers.
Online demo with Huggingface Gradio and Google Colab.

We need at least 22 Gb GPU memory for the demo. Therefore gradio and colab online demo might need pro account to obtain more GPU/memory to run them.

Support Tasks

Audio to Image
Audio+Text to Image
Audio+Image to Image
Image to Image
Text to Image
Thermal to Image
Depth to Image: Coming soon.

Update

[2023/5/19]:

Anything2Image has been integrated into InternGPT.
[v1.1.4]: Support fusing audio and text in ImageBind latent space and UI improvements.

[2023/5/18]

[v1.1.3]: Support thermal to image.
[v1.1.0]: Gradio GUI - add options for controling image size, and noise scheduler.
[v1.0.8]: Gradio GUI - add options for controling noise level, audio-image embedding arithmetic strength, and number of inference steps.

anything2image.mp4

Getting Started

Requirements

Ensure you have PyTorch installed.

Python >= 3.8
PyTorch >= 1.13

Then install the anything2image.

# from pypi
pip install anything2image
# or locally install via git clone
git clone [email protected]:Zeqiang-Lai/Anything2Image.git
cd Anything2Image
pip install .

Usage

# lanuch gradio demo
python -m anything2image.app
# command line demo, see also the tasks examples below.
python -m anything2image.cli --audio assets/wav/cat.wav

Audio to Image

bird_audio.wav	dog_audio.wav	cattle.wav	cat.wav

fire_engine.wav	train.wav	motorcycle.wav	plane.wav

python -m anything2image.cli --audio assets/wav/cat.wav

Audio+Text to Image

cat.wav	cat.wav	bird_audio.wav	bird_audio.wav
A painting	A photo	A painting	A photo

python -m anything2image.cli --audio assets/wav/cat.wav --prompt "a painting"

Audio+Image to Image

Audio & Image	Output	Audio & Image	Output

wave.wav		wave.wav

python -m anything2image.cli --audio assets/wav/wave.wav --image "assets/image/bird.png"

with torch.no_grad():
    embeddings = model.forward({
        ib.ModalityType.VISION: ib.load_and_transform_vision_data(["assets/image/bird.png"], device),
    })
    img_embeddings = embeddings[ib.ModalityType.VISION]
    embeddings = model.forward({
        ib.ModalityType.AUDIO: ib.load_and_transform_audio_data(["assets/wav/wave.wav"], device),
    }, normalize=False)
    audio_embeddings = embeddings[ib.ModalityType.AUDIO]
    embeddings = (img_embeddings + audio_embeddings)/2
    images = pipe(image_embeds=embeddings.half()).images
    images[0].save("audioimg2img.png")

Image to Image

Top: Input Images. Bottom: Generated Images.

python -m anything2image.cli --image "assets/image/bird.png"

Text to Image

A photo of a car.	A sunset over the ocean.	A bird's-eye view of a cityscape.	A close-up of a flower.

It is not necessary to use ImageBind for text to image. Nervertheless, we show the alignment of ImageBind's text latent space and its image spaces.

python -m anything2image.cli --text "A sunset over the ocean."

Thermal to Image

Input	Output	Input	Output

Top: Input Images. Bottom: Generated Images.

python -m anything2image.cli --thermal "assets/thermal/030419.jpg"

Citation

Latent Diffusion

@InProceedings{Rombach_2022_CVPR,
    author    = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
    title     = {High-Resolution Image Synthesis With Latent Diffusion Models},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {10684-10695}
}

ImageBind

@inproceedings{girdhar2023imagebind,
  title={ImageBind: One Embedding Space To Bind Them All},
  author={Girdhar, Rohit and El-Nouby, Alaaeldin and Liu, Zhuang
and Singh, Mannat and Alwala, Kalyan Vasudev and Joulin, Armand and Misra, Ishan},
  booktitle={CVPR},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
anything2image		anything2image
assets		assets
tasks		tasks
.gitignore		.gitignore
README.md		README.md
colab.ipynb		colab.ipynb
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

anything2image

anything2image

assets

assets

tasks

tasks

.gitignore

.gitignore

README.md

README.md

colab.ipynb

colab.ipynb

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Anything To Image

Getting Started

Audio to Image

Audio+Text to Image

Audio+Image to Image

Image to Image

Text to Image

Thermal to Image

Citation

About

Releases

Languages

Zeqiang-Lai/Anything2Image

Folders and files

Latest commit

History

Repository files navigation

Anything To Image

Getting Started

Audio to Image

Audio+Text to Image

Audio+Image to Image

Image to Image

Text to Image

Thermal to Image

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages