RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED #51

fishfree · 2024-04-06T06:21:07Z

(champ) meme@ubuntugpu:~/champ$ /mnt/data/meme/.conda/envs/champ/bin/python  inference.py --config configs/inference.yaml
[2024-04-06 14:13:13,650] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-04-06 14:13:14.193352: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-06 14:13:14.243075: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-06 14:13:15.100858: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/mnt/data/meme/.local/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
04/06/2024 14:13:16 - INFO - root - Running inference ...
04/06/2024 14:13:29 - INFO - models.unet_3d - loaded temporal unet's pretrained weights from pretrained_models/stable-diffusion-v1-5/unet ...
04/06/2024 14:13:56 - INFO - models.unet_3d - Load motion module params from pretrained_models/champ/motion_module.pth
04/06/2024 14:14:14 - INFO - models.unet_3d - Loaded 453.20928M-parameter motion module
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel: 
 ['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
Traceback (most recent call last):
  File "/mnt/data/meme/champ/inference.py", line 312, in <module>
    main(cfg)
  File "/mnt/data/meme/champ/inference.py", line 260, in main
    result_video_tensor = inference(
  File "/mnt/data/meme/champ/inference.py", line 134, in inference
    video = pipeline(
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/data/meme/champ/pipelines/pipeline_aggregation.py", line 387, in __call__
    clip_image_embeds = self.image_encoder(
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 1310, in forward
    vision_outputs = self.vision_model(
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 865, in forward
    hidden_states = self.embeddings(pixel_values)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 195, in forward
    patch_embeds = self.patch_embedding(pixel_values)  # shape = [*, width, grid, grid]
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

The text was updated successfully, but these errors were encountered:

AricGamma · 2024-04-09T11:18:40Z

Which version of cuda in your environment?

fishfree · 2024-04-09T22:31:20Z

@AricGamma Thank you! It's as below:
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0

AricGamma · 2024-04-10T23:48:24Z

Please provide some more environment context such as GPU model, vram, os version, nvidia driver version.

fishfree · 2024-04-21T03:29:17Z

~$ nvidia-smi
Sun Apr 21 11:28:53 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:3B:00.0 Off |                  N/A |
| 18%   26C    P8     1W / 250W |      3MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:5E:00.0 Off |                  N/A |
| 18%   26C    P8     1W / 250W |      3MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:B1:00.0 Off |                  N/A |
| 18%   26C    P8     6W / 250W |      3MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:D9:00.0 Off |                  N/A |
| 18%   27C    P8    13W / 250W |      3MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

AricGamma · 2024-04-23T02:25:02Z

Maby you can try to switch cuda version to 12.1 and install torch with specified cuda version, like this:

pip install torch==2.2.2+cu121 torchvision==0.17.2+cu121 --index-url https://download.pytorch.org/whl/cu121

AricGamma · 2024-05-05T23:12:39Z

Closing this issue because no further reply.
If you are still having problem, reopen this issue.

fishfree · 2024-05-07T22:50:32Z

Finally I run pip3 install -U nvidia-cudnn-cu12==8.9.7.29 works. However, CUDA OUT OF MEMORY :(

siyuzhu-fudan assigned AricGamma Apr 18, 2024

AricGamma closed this as completed May 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED #51

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED #51

fishfree commented Apr 6, 2024

AricGamma commented Apr 9, 2024

fishfree commented Apr 9, 2024

AricGamma commented Apr 10, 2024

fishfree commented Apr 21, 2024

AricGamma commented Apr 23, 2024

AricGamma commented May 5, 2024

fishfree commented May 7, 2024 •

edited

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED #51

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED #51

Comments

fishfree commented Apr 6, 2024

AricGamma commented Apr 9, 2024

fishfree commented Apr 9, 2024

AricGamma commented Apr 10, 2024

fishfree commented Apr 21, 2024

AricGamma commented Apr 23, 2024

AricGamma commented May 5, 2024

fishfree commented May 7, 2024 • edited

fishfree commented May 7, 2024 •

edited