GitHub - sunil-2000/pixel-GIFfusion: Generating Pixel-Art-Style Gifs from Text Prompts

Pixel Giffusion (text-to-pixel-GIF)

Generating Pixel-Art-Style Gifs from Text Prompts
Examples »

Table of Contents

About The Project
Getting Started
- Prerequisites
- Installation
License
Contact
Acknowledgments

About The Project

Generative AI is a fast-growing niche in the world of machine learning with applications in various industries including AI generated art. To expand on this niche, we have fine-tuned an image diffusion model to generate pixel-style GIFs that morph between two images from provided text prompts. This involved fine-tuning an existing diffusion model on pixel-art dataset(s), generating two images from the first and second prompt, and interpolating their prompts and latent noise tensors together to morph the generated images. To create pixel-art GIFs we extracted the interpolation outputs and stitched the outputs together. This research aims to narrow down the use cases of diffusion models and contribute to the field of generative modeling.

[Back to Top]

Built With

[Back to Top]

Datasets & Models

The datasets that we experimented with for the fine-tuning process are listed below. They are all on HuggingFace.

jainr3/diffusiondb-pixelart: This is a subset of the DiffusionDB dataset containing image samples that have been passed through the pixelatorapp.com tool to make "pixel-art" style images.
sunilSabnis/pixelart: This is a dataset of pixel-style art generated from the stable-diffusion2-1 model itself. The prompts were selected from andyyang/stable_diffusion_prompts_2m.
jiovine/pixel-art-nouns-2k: This is a class-specific dataset of pixel-style art; more specifically the images are of cartoon characters.

The models that were obtained as a result of fine-tuning with these datasets are listed below. These are all on HuggingFace.

jainr3/sd-diffusiondb-pixelart-model-lora: These are LoRA adaption weights for stabilityai/stable-diffusion-2-1. The weights were fine-tuned on the jainr3/diffusiondb-pixelart dataset.
jainr3/sd-pixelart-model-lora: These are LoRA adaption weights for stabilityai/stable-diffusion-2-1. The weights were fine-tuned on the sunilSabnis/pixelart dataset.
jainr3/sd-nouns-model-lora: These are LoRA adaption weights for stabilityai/stable-diffusion-2-1. The weights were fine-tuned on the jiovine/pixel-art-nouns-2k dataset.
jainr3/sd-diffusiondb-pixelart-v2-model-lora : These are LoRA adaption weights for stabilityai/stable-diffusion-2-1. The weights were fine-tuned on the jainr3/diffusiondb-pixelart dataset. This model has been trained for 30 epochs while the jainr3/sd-diffusiondb-pixelart-model-lora model was trained on only 5 epochs.

[Back to Top]

Examples

First text prompt: "Snowy cabin in the woods"

Second text prompt: "A medieval castle"

See here for more details.

First text prompt: "The sun shining brightly"

Second text prompt: "A full moon glowing brightly"

See here for more details.

First text prompt: "A snowy mountain"

Second text prompt: "Pyramids in Egypt"

See here for more details.

First text prompt: "A surfer"

Second text prompt: "A snowboarder"

GIF-chaining example

'Snowy cabin in the woods', 'A house boat on a lake', 'A beach house on a sunny day', 'A medieval castle', 'A gothic style clock tower', 'A skyscraper in a large metropolitan city', 'The empire state building in new york city during night'

See here for more details.

[Back to Top]

Getting Started

To get a local copy up and running follow these simple example steps.

Prerequisites

A powerful GPU is necessary for most parts, so one may opt to use Google Colaboratory where an A100 high-RAM GPU is easily available with the Colab Pro plan.

Installation

Clone the repo

git clone https://github.com/sunil-2000/text-to-pixel-gif.git

Install the requirements (install diffusers and transformers libraries at a minimum for inference)

pip install -r requirements.txt

Model Fine-Tuning

Obtain a Huggingface API Key from https://huggingface.co/ and save for later.
Obtain a Wandb API Key from https://wandb.ai/ and save for later.
Utilize the fine-tuning scripts located in the colab-notebooks folder. There are a number of example scripts for the different experiments that we performed which are for example using different datasets or training for shorter/longer. The API keys will be needed in these scripts when prompted.

GIF Generation / Chaining

See this example for detailed instructions.

[Back to Top]

License

Distributed under the MIT License. See LICENSE.txt for more information.

[Back to Top]

Contact

Rahul Jain, Sunil Sabnis, Joseph Iovine, Kenneth Alvarez, and Carlos Ponce

Project Link: https://github.com/sunil-2000/text-to-pixel-gif

[Back to Top]

Acknowledgments

This project was created as a part of the CS 5787 Deep Learning Final Project for the Spring 2023 semester at Cornell Tech under the guidance of Professor Alex Jaimes.

[Back to Top]

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
colab-notebooks		colab-notebooks
images		images
out_images		out_images
runs		runs
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pixel Giffusion (text-to-pixel-GIF)

About The Project

Built With

Datasets & Models

Examples

Getting Started

Prerequisites

Installation

License

Contact

Acknowledgments

About

Releases

Packages

Contributors 5

Languages

License

sunil-2000/pixel-GIFfusion

Folders and files

Latest commit

History

Repository files navigation

Pixel Giffusion (text-to-pixel-GIF)

About The Project

Built With

Datasets & Models

Examples

Getting Started

Prerequisites

Installation

License

Contact

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages