Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running step2 with: torch.cuda.OutOfMemoryError: CUDA out of memory #48

Open
tianqingyu opened this issue Dec 6, 2023 · 5 comments
Open

Comments

@tianqingyu
Copy link

My video card is rtx4090, 24G VRAM
System is ubuntu 22

Here is the error message:
args.input_path = ../results/base/a_panda_taking_a_selfie,_2k,_high_quality.mp4
args.prompt = ['a_panda_taking_a_selfie,_2k,_high_quality']
loading video from ../results/base/a_panda_taking_a_selfie,_2k,high_quality.mp4
Traceback (most recent call last):
File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 307, in
main(**OmegaConf.load(args.config))
File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 279, in main
video_clip = auto_inpainting_copy_no_mask(args, video_input, prompt, vae, text_encoder, diffusion, model, device,)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 142, in auto_inpainting_copy_no_mask
video_input = vae.encode(video_input).latent_dist.sample().mul
(0.18215)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
return method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/autoencoder_kl.py", line 164, in encode
h = self.encoder(x)
^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/vae.py", line 129, in forward
sample = down_block(sample)
^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py", line 1014, in forward
hidden_states = resnet(hidden_states, temb=None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/resnet.py", line 599, in forward
output_tensor = (input_tensor + hidden_states) / self.output_scale_factor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.77 GiB (GPU 0; 23.65 GiB total capacity; 18.59 GiB already allocated; 4.41 GiB free; 18.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF


After testing, except for the memory overflow when running interpolation, both base and vsr can run normally.

@maxin-cn
Copy link
Contributor

maxin-cn commented Dec 6, 2023

My video card is rtx4090, 24G VRAM System is ubuntu 22

Here is the error message: args.input_path = ../results/base/a_panda_taking_a_selfie,_2k,_high_quality.mp4 args.prompt = ['a_panda_taking_a_selfie,_2k,_high_quality'] loading video from ../results/base/a_panda_taking_a_selfie,_2k,high_quality.mp4 Traceback (most recent call last): File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 307, in main(**OmegaConf.load(args.config)) File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 279, in main video_clip = auto_inpainting_copy_no_mask(args, video_input, prompt, vae, text_encoder, diffusion, model, device,) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 142, in auto_inpainting_copy_no_mask video_input = vae.encode(video_input).latent_dist.sample().mul(0.18215) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper return method(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/autoencoder_kl.py", line 164, in encode h = self.encoder(x) ^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/vae.py", line 129, in forward sample = down_block(sample) ^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py", line 1014, in forward hidden_states = resnet(hidden_states, temb=None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/resnet.py", line 599, in forward output_tensor = (input_tensor + hidden_states) / self.output_scale_factor ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.77 GiB (GPU 0; 23.65 GiB total capacity; 18.59 GiB already allocated; 4.41 GiB free; 18.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

After testing, except for the memory overflow when running interpolation, both base and vsr can run normally.

@tianqingyu Hi, currently, only base supports half-precision sampling. We will support half-precision for vsr and interpolation in the future. I believe half-precision running interpolation will work on your machine. BTW, any PR is welcome.

@jackyin68
Copy link

jackyin68 commented Dec 7, 2023

Then, how many GPU satisfied? @tianqingyu

@tianqingyu
Copy link
Author

Then, how many GPU satisfied? @tianqingyu

My server has 4 video cards (4090), but I don't have the ability to change the code to GPU parallelism at the moment. I think 4 GPUs with 24Gx4 should be enough

@maxin-cn
Copy link
Contributor

maxin-cn commented Dec 7, 2023

Then, how many GPU satisfied? @tianqingyu

My server has 4 video cards (4090), but I don't have the ability to change the code to GPU parallelism at the moment. I think 4 GPUs with 24Gx4 should be enough

@tianqingyu Hi, I suggest you modify the interpolation to half-precision sampling according to the half-precision test code in base. I think half-precision interpolation sampling can be successfully run. Or, you can wait until we support interpolation half-precision sampling.

@Ednaordinary
Copy link

Ednaordinary commented Apr 29, 2024

Or, you can wait until we support interpolation half-precision sampling.
@maxin-cn

Do you still plan to do this? I'm a bit confused here. Is half precision referring to loading the model and latents in float16? I'm also a bit confused why the returned latents from the base are in [1, 4, 16, 40, 64] but the interpolation model is in [2, 4, 61, 64, 40]. Is there a reason for switching the height and width, cutting 3 of the frames, and concatenating it to itself?

edit: Also looks like the original issue is in the vae call. Try adding vae.enable_slicing() after its declaration. Might still run out of memory but you may get past the encode call

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants