Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED #51

Closed
fishfree opened this issue Apr 6, 2024 · 7 comments
Closed

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED #51

fishfree opened this issue Apr 6, 2024 · 7 comments
Assignees

Comments

@fishfree
Copy link

fishfree commented Apr 6, 2024

(champ) meme@ubuntugpu:~/champ$ /mnt/data/meme/.conda/envs/champ/bin/python  inference.py --config configs/inference.yaml
[2024-04-06 14:13:13,650] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-04-06 14:13:14.193352: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-06 14:13:14.243075: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-06 14:13:15.100858: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/mnt/data/meme/.local/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
04/06/2024 14:13:16 - INFO - root - Running inference ...
04/06/2024 14:13:29 - INFO - models.unet_3d - loaded temporal unet's pretrained weights from pretrained_models/stable-diffusion-v1-5/unet ...
04/06/2024 14:13:56 - INFO - models.unet_3d - Load motion module params from pretrained_models/champ/motion_module.pth
04/06/2024 14:14:14 - INFO - models.unet_3d - Loaded 453.20928M-parameter motion module
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel: 
 ['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
Traceback (most recent call last):
  File "/mnt/data/meme/champ/inference.py", line 312, in <module>
    main(cfg)
  File "/mnt/data/meme/champ/inference.py", line 260, in main
    result_video_tensor = inference(
  File "/mnt/data/meme/champ/inference.py", line 134, in inference
    video = pipeline(
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/data/meme/champ/pipelines/pipeline_aggregation.py", line 387, in __call__
    clip_image_embeds = self.image_encoder(
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 1310, in forward
    vision_outputs = self.vision_model(
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 865, in forward
    hidden_states = self.embeddings(pixel_values)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 195, in forward
    patch_embeds = self.patch_embedding(pixel_values)  # shape = [*, width, grid, grid]
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
@AricGamma
Copy link
Member

Which version of cuda in your environment?

@fishfree
Copy link
Author

fishfree commented Apr 9, 2024

@AricGamma Thank you! It's as below:
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0

@AricGamma
Copy link
Member

Please provide some more environment context such as GPU model, vram, os version, nvidia driver version.

@fishfree
Copy link
Author

~$ nvidia-smi
Sun Apr 21 11:28:53 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:3B:00.0 Off |                  N/A |
| 18%   26C    P8     1W / 250W |      3MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:5E:00.0 Off |                  N/A |
| 18%   26C    P8     1W / 250W |      3MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:B1:00.0 Off |                  N/A |
| 18%   26C    P8     6W / 250W |      3MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:D9:00.0 Off |                  N/A |
| 18%   27C    P8    13W / 250W |      3MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

@AricGamma
Copy link
Member

Maby you can try to switch cuda version to 12.1 and install torch with specified cuda version, like this:

pip install torch==2.2.2+cu121 torchvision==0.17.2+cu121 --index-url https://download.pytorch.org/whl/cu121

@AricGamma
Copy link
Member

Closing this issue because no further reply.
If you are still having problem, reopen this issue.

@fishfree
Copy link
Author

fishfree commented May 7, 2024

Finally I run pip3 install -U nvidia-cudnn-cu12==8.9.7.29 works. However, CUDA OUT OF MEMORY :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants