Add CLI script #153

philgzl · 2023-01-24T18:39:38Z

This PR adds a script scripts/make_video.py to make videos from the command line, for those like me who prefer that over notebooks, to e.g. run from a cluster node. The script takes as argument most if not all the arguments featured in README.md. Help message looks like this:

$ python scripts/make_video.py --help
usage: make_video.py [-h] [--checkpoint_id CHECKPOINT_ID] [--prompts PROMPTS [PROMPTS ...]] [--seeds SEEDS [SEEDS ...]]
                     [--num_interpolation_steps NUM_INTERPOLATION_STEPS [NUM_INTERPOLATION_STEPS ...]] [--output_dir OUTPUT_DIR] [--name NAME] [--fps FPS]
                     [--guidance_scale GUIDANCE_SCALE] [--num_inference_steps NUM_INFERENCE_STEPS] [--height HEIGHT] [--width WIDTH] [--upsample]
                     [--batch_size BATCH_SIZE] [--audio_filepath AUDIO_FILEPATH] [--audio_offsets AUDIO_OFFSETS [AUDIO_OFFSETS ...]]
                     [--negative_prompt NEGATIVE_PROMPT] [--cfg CFG]

options:
  -h, --help            show this help message and exit
  --checkpoint_id CHECKPOINT_ID
                        checkpoint id on huggingface (default: stabilityai/stable-diffusion-2-1)
  --prompts PROMPTS [PROMPTS ...]
                        sequence of prompts (default: None)
  --seeds SEEDS [SEEDS ...]
                        seed for each prompt (default: None)
  --num_interpolation_steps NUM_INTERPOLATION_STEPS [NUM_INTERPOLATION_STEPS ...]
                        number of steps between each image (default: None)
  --output_dir OUTPUT_DIR
                        output directory (default: dreams)
  --name NAME           output sub-directory (default: None)
  --fps FPS             frames per second (default: 10)
  --guidance_scale GUIDANCE_SCALE
                        diffusion guidance scale (default: 7.5)
  --num_inference_steps NUM_INFERENCE_STEPS
                        number of diffusion inference steps (default: 50)
  --height HEIGHT       output image height (default: 512)
  --width WIDTH         output image width (default: 512)
  --upsample            upscale x4 using Real-ESRGAN (default: False)
  --batch_size BATCH_SIZE
                        batch size (default: 1)
  --audio_filepath AUDIO_FILEPATH
                        path to audio file (default: None)
  --audio_offsets AUDIO_OFFSETS [AUDIO_OFFSETS ...]
                        audio offset for each prompt (default: None)
  --negative_prompt NEGATIVE_PROMPT
                        negative prompt (one for all images) (default: None)
  --cfg CFG             yaml config file (overwrites other options) (default: None)

The user can also directly provide a YAML configuration file containing all the arguments to overwrite using python scripts/make_video.py --cfg <config_file>. The file should contain fields with the same name as the arguments.

The script is the same whether the user wants to add audio or not. If the user wants to add audio, he should provide the --audio_filepath and --audio_offsets arguments.

In my opinion, this deprecates examples/make_music_video.py. That file seems to be broken anyway (see #150). If the purpose of that script is to serve as a code example, then the snippets in README.md are currently doing a better job. If its purpose is to have a standalone script ready to run from the command line, then this PR implements that and more.

Updated README.md with an example.

nateraw

Very nice! Thanks so much for the contribution. I'm just getting over covid so I may be slow to respond, but I left some comments below.

Feel free to ask any questions you may have/rebuttal any points I made. I'm not too picky, and can be convinced otherwise if I made some opinionated points you disagree with.

nateraw · 2023-01-25T23:47:29Z

scripts/make_video.py

+from stable_diffusion_videos import StableDiffusionWalkPipeline
+
+
+def init_arg_parser():


Since we're already installing fire with the requirements of the package, maybe lets just use that instead? I can update to do this so its not a hassle for you :)

I am not too familiar with fire but I can give it a try. Tho after quickly skimming the docs, while this would considerably reduce boilerplate, I think I prefer the flexibility of argparse. E.g. I prefer calling

python scripts/make_video.py --prompts "a cat" "a dog" --seeds 42 1337

over

python scripts/make_video.py --prompts="['a cat', 'a dog']" --seeds=[42,1337] # note that --seeds=[42, 1337] would fail!

Moreover I can feel some dirty hacking would be required to keep support for argument provision through config file using the --cfg option, which is an important feature IMO.

Let me know what you think. If this is something you really require then I will give it a shot.

Interesting. I think I agree with you! will have a look when I can

nateraw · 2023-01-25T23:48:23Z

scripts/make_video.py

+ if args.prompts is None:
+ raise ValueError('no prompt provided')
+ if args.seeds is None:
+ args.seeds = [random.getrandbits(16) for _ in args.prompts]


I've been using randint instead in this scenario, kinda like this though :)

I think using multiple methods for random numbers seems like a good idea the more I think about it.

wdym @Atomic-Germ ?

wdym @Atomic-Germ ?

I rescind my comment, it was a little half-baked..

nateraw · 2023-01-25T23:48:53Z

scripts/make_video.py

+
+ # check audio arguments
+ if args.audio_filepath is not None and args.audio_offsets is None:
+ raise ValueError('must provide audio_offsets when providing '


makes me wonder if this should just be raised in the pipeline code itself instead of the parser (if its not already)

same goes for many of the other raised errors in this script

Maybe. That's a design question IMO. Do we want to raise errors to the unadvised CLI user as early as possible, while trusting that the developer who writes his owns scripts knows what they are doing? Or do we want to raise errors as close to the problematic code/as late as possible but such that it propagates?

Agreed. I'm fine with the way you did it here :)

Reason I say it though is that walk used to be a CLI interface when I first made this repo, so it should be the fn catching all the cases...but we can do it this way for now instead, I'm not picky.

nateraw · 2023-01-25T23:50:09Z

scripts/make_video.py

+ pipe = StableDiffusionWalkPipeline.from_pretrained(
+ args.checkpoint_id,
+ torch_dtype=torch.float16,
+ revision="fp16",


I think guidance in diffusers these days is erring towards not specifying a revision. Need to check if that only applies to newest versions, etc.

Definitely hardcoding here is a no-no though.

Suggested change

revision="fp16",

Sure, will add this as an option

nateraw · 2023-01-25T23:50:42Z

scripts/make_video.py

+
+ pipe = StableDiffusionWalkPipeline.from_pretrained(
+ args.checkpoint_id,
+ torch_dtype=torch.float16,


I think hardcoding dtype here is also a no-no I'm afraid. Let's think of a nicer way to infer this.

has to support MPS/GPU/TPU

on second thought, no tpu as you'd have to use the other pipeline

device = "mps" if torch.backends.mps.is_available() else "cuda" if torch.backends.cuda.is_available() else "cpu"
torch_dtype = torch.float32 if torch.backends.mps.is_available() else torch.float16

then use to(device) in place of to("cuda") and torch_dtype=torch_dtype

Yep will change that

nateraw · 2023-01-25T23:51:14Z

scripts/make_video.py

+ feature_extractor=None,
+ safety_checker=None,
+ ).to("cuda")
+ pipe.scheduler = DPMSolverMultistepScheduler.from_config(


Hardcoding this likely bad idea too

Oops that slipped my mind, these should be options too

Atomic-Germ · 2023-01-26T02:06:01Z

Maybe worth noting, but the batch_size option set to anything but 1 is going to break on mps.

philgzl · 2023-01-26T07:59:12Z

Maybe worth noting, but the batch_size option set to anything but 1 is going to break on mps.

Right. We could hard set batch_size=1 with MPS and raise a warning in case the user provided anything different.

philgzl · 2023-02-18T23:52:37Z

Still haven't started working in applying the suggested changes, will do it soon

nateraw · 2023-02-19T00:48:59Z

No rush :) whenever you get to it. I appreciate your contributions ❤️

philgzl added 6 commits January 24, 2023 15:50

Add make video script

b25eb57

Remove deprecated music video script

3342302

Fix default num_interpolation_steps

961a8db

Increase default num_interpolation_steps

21542ae

Fix audio_start_sec set to args.audio_offsets

6c261f7

Update README.md

ce34e15

nateraw reviewed Jan 25, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CLI script #153

Add CLI script #153

philgzl commented Jan 24, 2023

nateraw left a comment

nateraw Jan 25, 2023

philgzl Jan 26, 2023

nateraw Feb 18, 2023

nateraw Jan 25, 2023

Atomic-Germ Jan 26, 2023

nateraw Jan 26, 2023

Atomic-Germ Jan 27, 2023

nateraw Jan 25, 2023

nateraw Jan 25, 2023

philgzl Jan 26, 2023

nateraw Jan 27, 2023

nateraw Jan 27, 2023

nateraw Jan 25, 2023

philgzl Jan 26, 2023

nateraw Jan 25, 2023

nateraw Jan 25, 2023

nateraw Jan 25, 2023

Atomic-Germ Jan 26, 2023 •

edited

philgzl Jan 26, 2023

nateraw Jan 25, 2023

philgzl Jan 26, 2023

Atomic-Germ commented Jan 26, 2023

philgzl commented Jan 26, 2023

philgzl commented Feb 18, 2023

nateraw commented Feb 19, 2023

		from stable_diffusion_videos import StableDiffusionWalkPipeline


		def init_arg_parser():

Add CLI script #153

Are you sure you want to change the base?

Add CLI script #153

Conversation

philgzl commented Jan 24, 2023

nateraw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Atomic-Germ Jan 26, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Atomic-Germ commented Jan 26, 2023

philgzl commented Jan 26, 2023

philgzl commented Feb 18, 2023

nateraw commented Feb 19, 2023

Atomic-Germ Jan 26, 2023 •

edited