Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CLI script #153

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ video_path = pipeline.walk(
)
```

#### Run the App Locally
### Run the App Locally

```python
from stable_diffusion_videos import StableDiffusionWalkPipeline, Interface
Expand All @@ -123,6 +123,13 @@ interface = Interface(pipeline)
interface.launch()
```

### CLI

The script `scripts/make_video.py` also provides a CLI. Example:
```bash
python scripts/make_video.py --prompts "a cat" "a dog" --fps 10
```

## Credits

This work built off of [a script](https://gist.github.com/karpathy/00103b0037c5aaea32fe1da1af553355
Expand Down
60 changes: 0 additions & 60 deletions examples/make_music_video.py

This file was deleted.

154 changes: 154 additions & 0 deletions scripts/make_video.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
import argparse
import random

import torch
import yaml

from diffusers import DPMSolverMultistepScheduler
from stable_diffusion_videos import StableDiffusionWalkPipeline


def init_arg_parser():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're already installing fire with the requirements of the package, maybe lets just use that instead? I can update to do this so its not a hassle for you :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too familiar with fire but I can give it a try. Tho after quickly skimming the docs, while this would considerably reduce boilerplate, I think I prefer the flexibility of argparse. E.g. I prefer calling

python scripts/make_video.py --prompts "a cat" "a dog" --seeds 42 1337

over

python scripts/make_video.py --prompts="['a cat', 'a dog']" --seeds=[42,1337]  # note that --seeds=[42, 1337] would fail!

Moreover I can feel some dirty hacking would be required to keep support for argument provision through config file using the --cfg option, which is an important feature IMO.

Let me know what you think. If this is something you really require then I will give it a shot.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I think I agree with you! will have a look when I can

parser = argparse.ArgumentParser(
formatter_class=argparse.ArgumentDefaultsHelpFormatter
)

parser.add_argument('--checkpoint_id',
default="stabilityai/stable-diffusion-2-1",
help="checkpoint id on huggingface")
parser.add_argument('--prompts', nargs='+',
help='sequence of prompts')
parser.add_argument('--seeds', type=int, nargs='+',
help='seed for each prompt')
parser.add_argument('--num_interpolation_steps', type=int, nargs='+',
help='number of steps between each image')
parser.add_argument('--output_dir', default="dreams",
help='output directory')
parser.add_argument('--name',
help='output sub-directory')
parser.add_argument('--fps', type=int, default=10,
help='frames per second')
parser.add_argument('--guidance_scale', type=float, default=7.5,
help='diffusion guidance scale')
parser.add_argument('--num_inference_steps', type=int, default=50,
help='number of diffusion inference steps')
parser.add_argument('--height', type=int, default=512,
help='output image height')
parser.add_argument('--width', type=int, default=512,
help='output image width')
parser.add_argument('--upsample', action='store_true',
help='upscale x4 using Real-ESRGAN')
parser.add_argument('--batch_size', type=int, default=1,
help='batch size')
parser.add_argument('--audio_filepath',
help='path to audio file')
parser.add_argument('--audio_offsets', type=int, nargs='+',
help='audio offset for each prompt')
parser.add_argument('--negative_prompt',
help='negative prompt (one for all images)')

parser.add_argument('--cfg',
help='yaml config file (overwrites other options)')

return parser


def parse_args(parser):
args = parser.parse_args()

# read config file
if args.cfg is not None:
with open(args.cfg) as f:
cfg = yaml.safe_load(f)
for key, val in cfg.items():
if hasattr(args, key):
setattr(args, key, val)
else:
raise ValueError(f'bad field in config file: {key}')

# check for prompts
if args.prompts is None:
raise ValueError('no prompt provided')
if args.seeds is None:
args.seeds = [random.getrandbits(16) for _ in args.prompts]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using randint instead in this scenario, kinda like this though :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using multiple methods for random numbers seems like a good idea the more I think about it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdym @Atomic-Germ ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdym @Atomic-Germ ?

I rescind my comment, it was a little half-baked..


# check audio arguments
if args.audio_filepath is not None and args.audio_offsets is None:
raise ValueError('must provide audio_offsets when providing '
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes me wonder if this should just be raised in the pipeline code itself instead of the parser (if its not already)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same goes for many of the other raised errors in this script

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. That's a design question IMO. Do we want to raise errors to the unadvised CLI user as early as possible, while trusting that the developer who writes his owns scripts knows what they are doing? Or do we want to raise errors as close to the problematic code/as late as possible but such that it propagates?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'm fine with the way you did it here :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason I say it though is that walk used to be a CLI interface when I first made this repo, so it should be the fn catching all the cases...but we can do it this way for now instead, I'm not picky.

'audio_filepath')
if args.audio_offsets is not None and args.audio_filepath is None:
raise ValueError('must provide audio_filepath when providing '
'audio_offsets')

# check lengths
if args.audio_offsets is not None:
if not len(args.prompts) == len(args.seeds) == len(args.audio_offsets):
raise ValueError('prompts, seeds and audio_offsets must have same '
f'length, got lengths {len(args.prompts)}, '
f'{len(args.seeds)} and '
f'{len(args.audio_offsets)} respectively')
else:
if not len(args.prompts) == len(args.seeds):
raise ValueError('prompts and seeds must have same length, got '
f'lengths {len(args.prompts)} and '
f'{len(args.seeds)} respectively')

# set num_interpolation_steps
if args.audio_offsets is not None \
and args.num_interpolation_steps is not None:
raise ValueError('cannot provide both audio_offsets and '
'num_interpolation_steps')
elif args.audio_offsets is not None:
args.num_interpolation_steps = [
(b-a)*args.fps for a, b in zip(
args.audio_offsets, args.audio_offsets[1:]
)
]
elif args.num_interpolation_steps is not None \
and not len(args.num_interpolation_steps) == len(args.prompts)-1:
raise ValueError('num_interpolation_steps must have length '
f'len(prompts)-1, got '
f'{len(args.num_interpolation_steps)} != '
f'{len(args.prompts)-1}')
else:
args.num_interpolation_steps = args.fps*10 # 10 second video

return args


def main():
parser = init_arg_parser()
args = parse_args(parser)

pipe = StableDiffusionWalkPipeline.from_pretrained(
args.checkpoint_id,
torch_dtype=torch.float16,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think hardcoding dtype here is also a no-no I'm afraid. Let's think of a nicer way to infer this.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has to support MPS/GPU/TPU

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on second thought, no tpu as you'd have to use the other pipeline

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

device = "mps" if torch.backends.mps.is_available() else "cuda" if torch.backends.cuda.is_available() else "cpu"
torch_dtype = torch.float32 if torch.backends.mps.is_available() else torch.float16

then use to(device) in place of to("cuda") and torch_dtype=torch_dtype

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep will change that

revision="fp16",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think guidance in diffusers these days is erring towards not specifying a revision. Need to check if that only applies to newest versions, etc.

Definitely hardcoding here is a no-no though.

Suggested change
revision="fp16",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will add this as an option

feature_extractor=None,
safety_checker=None,
).to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding this likely bad idea too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops that slipped my mind, these should be options too

pipe.scheduler.config
)

pipe.walk(
prompts=args.prompts,
seeds=args.seeds,
num_interpolation_steps=args.num_interpolation_steps,
output_dir=args.output_dir,
name=args.name,
fps=args.fps,
num_inference_steps=args.num_inference_steps,
guidance_scale=args.guidance_scale,
height=args.height,
width=args.width,
upsample=args.upsample,
batch_size=args.batch_size,
audio_filepath=args.audio_filepath,
audio_start_sec=None if args.audio_offsets is None else args.audio_offsets[0],
negative_prompt=args.negative_prompt,
)


if __name__ == '__main__':
main()