Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CLI script #153

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Add CLI script #153

wants to merge 6 commits into from

Conversation

philgzl
Copy link
Contributor

@philgzl philgzl commented Jan 24, 2023

This PR adds a script scripts/make_video.py to make videos from the command line, for those like me who prefer that over notebooks, to e.g. run from a cluster node. The script takes as argument most if not all the arguments featured in README.md. Help message looks like this:

$ python scripts/make_video.py --help
usage: make_video.py [-h] [--checkpoint_id CHECKPOINT_ID] [--prompts PROMPTS [PROMPTS ...]] [--seeds SEEDS [SEEDS ...]]
                     [--num_interpolation_steps NUM_INTERPOLATION_STEPS [NUM_INTERPOLATION_STEPS ...]] [--output_dir OUTPUT_DIR] [--name NAME] [--fps FPS]
                     [--guidance_scale GUIDANCE_SCALE] [--num_inference_steps NUM_INFERENCE_STEPS] [--height HEIGHT] [--width WIDTH] [--upsample]
                     [--batch_size BATCH_SIZE] [--audio_filepath AUDIO_FILEPATH] [--audio_offsets AUDIO_OFFSETS [AUDIO_OFFSETS ...]]
                     [--negative_prompt NEGATIVE_PROMPT] [--cfg CFG]

options:
  -h, --help            show this help message and exit
  --checkpoint_id CHECKPOINT_ID
                        checkpoint id on huggingface (default: stabilityai/stable-diffusion-2-1)
  --prompts PROMPTS [PROMPTS ...]
                        sequence of prompts (default: None)
  --seeds SEEDS [SEEDS ...]
                        seed for each prompt (default: None)
  --num_interpolation_steps NUM_INTERPOLATION_STEPS [NUM_INTERPOLATION_STEPS ...]
                        number of steps between each image (default: None)
  --output_dir OUTPUT_DIR
                        output directory (default: dreams)
  --name NAME           output sub-directory (default: None)
  --fps FPS             frames per second (default: 10)
  --guidance_scale GUIDANCE_SCALE
                        diffusion guidance scale (default: 7.5)
  --num_inference_steps NUM_INFERENCE_STEPS
                        number of diffusion inference steps (default: 50)
  --height HEIGHT       output image height (default: 512)
  --width WIDTH         output image width (default: 512)
  --upsample            upscale x4 using Real-ESRGAN (default: False)
  --batch_size BATCH_SIZE
                        batch size (default: 1)
  --audio_filepath AUDIO_FILEPATH
                        path to audio file (default: None)
  --audio_offsets AUDIO_OFFSETS [AUDIO_OFFSETS ...]
                        audio offset for each prompt (default: None)
  --negative_prompt NEGATIVE_PROMPT
                        negative prompt (one for all images) (default: None)
  --cfg CFG             yaml config file (overwrites other options) (default: None)

The user can also directly provide a YAML configuration file containing all the arguments to overwrite using python scripts/make_video.py --cfg <config_file>. The file should contain fields with the same name as the arguments.

The script is the same whether the user wants to add audio or not. If the user wants to add audio, he should provide the --audio_filepath and --audio_offsets arguments.

In my opinion, this deprecates examples/make_music_video.py. That file seems to be broken anyway (see #150). If the purpose of that script is to serve as a code example, then the snippets in README.md are currently doing a better job. If its purpose is to have a standalone script ready to run from the command line, then this PR implements that and more.

Updated README.md with an example.

Copy link
Owner

@nateraw nateraw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks so much for the contribution. I'm just getting over covid so I may be slow to respond, but I left some comments below.

Feel free to ask any questions you may have/rebuttal any points I made. I'm not too picky, and can be convinced otherwise if I made some opinionated points you disagree with.

from stable_diffusion_videos import StableDiffusionWalkPipeline


def init_arg_parser():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're already installing fire with the requirements of the package, maybe lets just use that instead? I can update to do this so its not a hassle for you :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too familiar with fire but I can give it a try. Tho after quickly skimming the docs, while this would considerably reduce boilerplate, I think I prefer the flexibility of argparse. E.g. I prefer calling

python scripts/make_video.py --prompts "a cat" "a dog" --seeds 42 1337

over

python scripts/make_video.py --prompts="['a cat', 'a dog']" --seeds=[42,1337]  # note that --seeds=[42, 1337] would fail!

Moreover I can feel some dirty hacking would be required to keep support for argument provision through config file using the --cfg option, which is an important feature IMO.

Let me know what you think. If this is something you really require then I will give it a shot.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I think I agree with you! will have a look when I can

if args.prompts is None:
raise ValueError('no prompt provided')
if args.seeds is None:
args.seeds = [random.getrandbits(16) for _ in args.prompts]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using randint instead in this scenario, kinda like this though :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using multiple methods for random numbers seems like a good idea the more I think about it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdym @Atomic-Germ ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdym @Atomic-Germ ?

I rescind my comment, it was a little half-baked..


# check audio arguments
if args.audio_filepath is not None and args.audio_offsets is None:
raise ValueError('must provide audio_offsets when providing '
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes me wonder if this should just be raised in the pipeline code itself instead of the parser (if its not already)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same goes for many of the other raised errors in this script

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. That's a design question IMO. Do we want to raise errors to the unadvised CLI user as early as possible, while trusting that the developer who writes his owns scripts knows what they are doing? Or do we want to raise errors as close to the problematic code/as late as possible but such that it propagates?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'm fine with the way you did it here :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason I say it though is that walk used to be a CLI interface when I first made this repo, so it should be the fn catching all the cases...but we can do it this way for now instead, I'm not picky.

pipe = StableDiffusionWalkPipeline.from_pretrained(
args.checkpoint_id,
torch_dtype=torch.float16,
revision="fp16",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think guidance in diffusers these days is erring towards not specifying a revision. Need to check if that only applies to newest versions, etc.

Definitely hardcoding here is a no-no though.

Suggested change
revision="fp16",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will add this as an option


pipe = StableDiffusionWalkPipeline.from_pretrained(
args.checkpoint_id,
torch_dtype=torch.float16,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think hardcoding dtype here is also a no-no I'm afraid. Let's think of a nicer way to infer this.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has to support MPS/GPU/TPU

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on second thought, no tpu as you'd have to use the other pipeline

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

device = "mps" if torch.backends.mps.is_available() else "cuda" if torch.backends.cuda.is_available() else "cpu"
torch_dtype = torch.float32 if torch.backends.mps.is_available() else torch.float16

then use to(device) in place of to("cuda") and torch_dtype=torch_dtype

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep will change that

feature_extractor=None,
safety_checker=None,
).to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding this likely bad idea too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops that slipped my mind, these should be options too

@Atomic-Germ
Copy link
Contributor

Maybe worth noting, but the batch_size option set to anything but 1 is going to break on mps.

@philgzl
Copy link
Contributor Author

philgzl commented Jan 26, 2023

Maybe worth noting, but the batch_size option set to anything but 1 is going to break on mps.

Right. We could hard set batch_size=1 with MPS and raise a warning in case the user provided anything different.

@philgzl
Copy link
Contributor Author

philgzl commented Feb 18, 2023

Still haven't started working in applying the suggested changes, will do it soon

@nateraw
Copy link
Owner

nateraw commented Feb 19, 2023

No rush :) whenever you get to it. I appreciate your contributions ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants