Running step2 with: torch.cuda.OutOfMemoryError: CUDA out of memory #48

tianqingyu · 2023-12-06T12:27:22Z

My video card is rtx4090, 24G VRAM
System is ubuntu 22

Here is the error message:
args.input_path = ../results/base/a_panda_taking_a_selfie,_2k,_high_quality.mp4
args.prompt = ['a_panda_taking_a_selfie,_2k,_high_quality']
loading video from ../results/base/a_panda_taking_a_selfie,_2k,high_quality.mp4
Traceback (most recent call last):
File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 307, in
main(**OmegaConf.load(args.config))
File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 279, in main
video_clip = auto_inpainting_copy_no_mask(args, video_input, prompt, vae, text_encoder, diffusion, model, device,)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 142, in auto_inpainting_copy_no_mask
video_input = vae.encode(video_input).latent_dist.sample().mul(0.18215)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
return method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/autoencoder_kl.py", line 164, in encode
h = self.encoder(x)
^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/vae.py", line 129, in forward
sample = down_block(sample)
^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py", line 1014, in forward
hidden_states = resnet(hidden_states, temb=None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/resnet.py", line 599, in forward
output_tensor = (input_tensor + hidden_states) / self.output_scale_factor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.77 GiB (GPU 0; 23.65 GiB total capacity; 18.59 GiB already allocated; 4.41 GiB free; 18.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

After testing, except for the memory overflow when running interpolation, both base and vsr can run normally.

maxin-cn · 2023-12-06T23:06:06Z

My video card is rtx4090, 24G VRAM System is ubuntu 22

Here is the error message: args.input_path = ../results/base/a_panda_taking_a_selfie,_2k,_high_quality.mp4 args.prompt = ['a_panda_taking_a_selfie,_2k,_high_quality'] loading video from ../results/base/a_panda_taking_a_selfie,_2k,high_quality.mp4 Traceback (most recent call last): File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 307, in main(**OmegaConf.load(args.config)) File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 279, in main video_clip = auto_inpainting_copy_no_mask(args, video_input, prompt, vae, text_encoder, diffusion, model, device,) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 142, in auto_inpainting_copy_no_mask video_input = vae.encode(video_input).latent_dist.sample().mul(0.18215) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper return method(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/autoencoder_kl.py", line 164, in encode h = self.encoder(x) ^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/vae.py", line 129, in forward sample = down_block(sample) ^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py", line 1014, in forward hidden_states = resnet(hidden_states, temb=None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/resnet.py", line 599, in forward output_tensor = (input_tensor + hidden_states) / self.output_scale_factor ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.77 GiB (GPU 0; 23.65 GiB total capacity; 18.59 GiB already allocated; 4.41 GiB free; 18.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

After testing, except for the memory overflow when running interpolation, both base and vsr can run normally.

@tianqingyu Hi, currently, only base supports half-precision sampling. We will support half-precision for vsr and interpolation in the future. I believe half-precision running interpolation will work on your machine. BTW, any PR is welcome.

jackyin68 · 2023-12-07T07:51:51Z

Then, how many GPU satisfied? @tianqingyu

tianqingyu · 2023-12-07T08:11:03Z

Then, how many GPU satisfied? @tianqingyu

My server has 4 video cards (4090), but I don't have the ability to change the code to GPU parallelism at the moment. I think 4 GPUs with 24Gx4 should be enough

maxin-cn · 2023-12-07T08:53:44Z

Then, how many GPU satisfied? @tianqingyu

My server has 4 video cards (4090), but I don't have the ability to change the code to GPU parallelism at the moment. I think 4 GPUs with 24Gx4 should be enough

@tianqingyu Hi, I suggest you modify the interpolation to half-precision sampling according to the half-precision test code in base. I think half-precision interpolation sampling can be successfully run. Or, you can wait until we support interpolation half-precision sampling.

Ednaordinary · 2024-04-29T01:17:41Z

Or, you can wait until we support interpolation half-precision sampling.
@maxin-cn

Do you still plan to do this? I'm a bit confused here. Is half precision referring to loading the model and latents in float16? I'm also a bit confused why the returned latents from the base are in [1, 4, 16, 40, 64] but the interpolation model is in [2, 4, 61, 64, 40]. Is there a reason for switching the height and width, cutting 3 of the frames, and concatenating it to itself?

edit: Also looks like the original issue is in the vae call. Try adding vae.enable_slicing() after its declaration. Might still run out of memory but you may get past the encode call

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running step2 with: torch.cuda.OutOfMemoryError: CUDA out of memory #48

Running step2 with: torch.cuda.OutOfMemoryError: CUDA out of memory #48

tianqingyu commented Dec 6, 2023

maxin-cn commented Dec 6, 2023

jackyin68 commented Dec 7, 2023 •

edited

tianqingyu commented Dec 7, 2023

maxin-cn commented Dec 7, 2023

Ednaordinary commented Apr 29, 2024 •

edited

Running step2 with: torch.cuda.OutOfMemoryError: CUDA out of memory #48

Running step2 with: torch.cuda.OutOfMemoryError: CUDA out of memory #48

Comments

tianqingyu commented Dec 6, 2023

maxin-cn commented Dec 6, 2023

jackyin68 commented Dec 7, 2023 • edited

tianqingyu commented Dec 7, 2023

maxin-cn commented Dec 7, 2023

Ednaordinary commented Apr 29, 2024 • edited

jackyin68 commented Dec 7, 2023 •

edited

Ednaordinary commented Apr 29, 2024 •

edited