CUDNN_STATUS_BAD_PARAM in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(4175): 'op' CUDNN_BACKEND_OPERATION: cudnnFinalize Failed #66598

sylviahamidah · 2024-04-29T03:57:49Z

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

tf 2.15.0

Custom code

Yes

OS platform and distribution

Google Colaboratory

Mobile device

Google Colaboratory

Python version

3.10.12

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

12.2

GPU model and memory

No response

Current behavior?

When I try "model.fit()" server stops with error message below
CUDNN_STATUS_BAD_PARAM
in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(4175): 'op' CUDNN_BACKEND_OPERATION: cudnnFinalize Failed
[[{{node AlbuNet/conv1/Conv2D}}]] [Op:__inference_train_function_49877]

How can I solve the issue?
Thanks!

Standalone code to reproduce the issue

you can see my work here
https://colab.research.google.com/drive/1n97wV5cfgdQfWmngK5EBJjsYSjJzfi_d?usp=sharing

Relevant log output

No response

SuryanarayanaY · 2024-04-29T05:25:16Z

Hi @sylviahamidah ,

Could you please try with model.fit() instead of model.fit_generator() as it is deprecated. Also request you to share minimal code snippet if above change won't work for you. Thanks.

sylviahamidah · 2024-04-29T17:53:06Z

Hi @SuryanarayanaY
I've tried model.fit() but the error is still the same.

Here I attached the minimal reproducible example, let me know if you need further infomation https://colab.research.google.com/drive/1FnwvnzkmeLYoaW42wkCKWw0j5xVPQ1Id?usp=sharing

SuryanarayanaY · 2024-04-30T09:05:21Z

Hi @sylviahamidah ,

Could you please confirm the cuDNN version in your system. Please cross check the compatible versions from here.

Also, please use tensorflow[and-cuda] package which will install compatible cuda/cudnn libraries.

sylviahamidah · 2024-05-05T14:45:42Z

Sorry for my late response, Sir @SuryanarayanaY

My system version is written below.
Tensorflow: 2.15.0
Python: 3.11.7
CUDA version: 12.2
CUDNN version: 8

Also I've tried to install tensorflow[and-cuda] with code below but the error is still the same.
!pip install --extra-index-url https://pypi.nvidia.com tensorrt-bindings==8.6.1 tensorrt-libs==8.6.1
!pip install -U tensorflow[and-cuda]==2.15.0

sylviahamidah · 2024-05-13T06:35:05Z

Update: After I installed pip install tensorflow[and-cuda], the error disappeared. Thank you for your assistance

google-ml-butler · 2024-05-13T06:35:07Z

Are you satisfied with the resolution of your issue?
Yes
No

google-ml-butler bot added the type:bug Bug label Apr 29, 2024

google-ml-butler bot assigned SuryanarayanaY Apr 29, 2024

SuryanarayanaY added comp:xla XLA TF 2.15 For issues related to 2.15.x stat:awaiting response Status - Awaiting response from author labels Apr 29, 2024

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Apr 29, 2024

SuryanarayanaY added the stat:awaiting response Status - Awaiting response from author label Apr 30, 2024

google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label May 5, 2024

sylviahamidah closed this as completed May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDNN_STATUS_BAD_PARAM in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(4175): 'op' CUDNN_BACKEND_OPERATION: cudnnFinalize Failed #66598

CUDNN_STATUS_BAD_PARAM in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(4175): 'op' CUDNN_BACKEND_OPERATION: cudnnFinalize Failed #66598

sylviahamidah commented Apr 29, 2024

SuryanarayanaY commented Apr 29, 2024

sylviahamidah commented Apr 29, 2024 •

edited

SuryanarayanaY commented Apr 30, 2024

sylviahamidah commented May 5, 2024

sylviahamidah commented May 13, 2024

google-ml-butler bot commented May 13, 2024

CUDNN_STATUS_BAD_PARAM in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(4175): 'op' CUDNN_BACKEND_OPERATION: cudnnFinalize Failed #66598

CUDNN_STATUS_BAD_PARAM in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(4175): 'op' CUDNN_BACKEND_OPERATION: cudnnFinalize Failed #66598

Comments

sylviahamidah commented Apr 29, 2024

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

SuryanarayanaY commented Apr 29, 2024

sylviahamidah commented Apr 29, 2024 • edited

SuryanarayanaY commented Apr 30, 2024

sylviahamidah commented May 5, 2024

sylviahamidah commented May 13, 2024

google-ml-butler bot commented May 13, 2024

sylviahamidah commented Apr 29, 2024 •

edited