Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference time #29

Open
puckikk1202 opened this issue Mar 28, 2024 · 4 comments
Open

Inference time #29

puckikk1202 opened this issue Mar 28, 2024 · 4 comments
Assignees

Comments

@puckikk1202
Copy link

Hi, I'm grateful for your excellent work! I've implemented the code as per the instructions, and it runs without errors. However, the inference time is slow, approximately 176 seconds per iteration. I tested it on an 80G A100 GPU, and it seems to be using around 71G of GPU memory. Is this normal?
image
image

@ShenhaoZhu
Copy link
Contributor

The inference time and the GPU memory usage have significantly exceeded expectations. You could try terminating unrelated processes and then give it another try.

@G-force78
Copy link

Thats odd, with a google A100 for motion-06 its peaking at 12.2GB

100% 20/20 [01:21<00:00, 4.05s/it]
100% 116/116 [00:04<00:00, 27.11it/s]

@chengzeyi
Copy link

chengzeyi commented Mar 29, 2024

Thats odd, with a google A100 for motion-06 its peaking at 12.2GB

100% 20/20 [01:21<00:00, 4.05s/it] 100% 116/116 [00:04<00:00, 27.11it/s]

How did you get that? With an RTX 4090 I get much more VRAM usage than that number.
Ok, that may be related a BUG in WSL2. But how did you achieve such a speed?

@G-force78
Copy link

G-force78 commented Mar 30, 2024

You need a large system RAM too, I've just tried using the T4 on colab free tier but the system RAM maxed out at 12gb loading the motion module, maybe that can be sent to VRAM instead if you have high VRAM?

Here is my config file using motion-06

num_inference_steps: 20
guidance_scale: 6
enable_zero_snr: true
weight_dtype: "fp16"

guidance_types:

  • 'depth'
  • 'normal'
  • 'semantic_map'
  • 'dwpose'

noise_scheduler_kwargs:
num_train_timesteps: 1000
beta_start: 0.00085
beta_end: 0.012
beta_schedule: "linear"
steps_offset: 1
clip_sample: false

unet_additional_kwargs:
use_inflated_groupnorm: true
unet_use_cross_frame_attention: false
unet_use_temporal_attention: false
use_motion_module: true
motion_module_resolutions:

  • 1
  • 2
  • 4
  • 8
    motion_module_mid_block: true
    motion_module_decoder_only: false
    motion_module_type: Vanilla
    motion_module_kwargs:
    num_attention_heads: 8
    num_transformer_block: 1
    attention_block_types:
    • Temporal_Self
    • Temporal_Self
      temporal_position_encoding: true
      temporal_position_encoding_max_len: 32
      temporal_attention_dim_div: 1

guidance_encoder_kwargs:
guidance_embedding_channels: 320
guidance_input_channels: 3
block_out_channels: [16, 32, 96, 256]

enable_xformers_memory_efficient_attention: true

@AricGamma AricGamma self-assigned this Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants