Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDNN_STATUS_BAD_PARAM in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(4175): 'op' CUDNN_BACKEND_OPERATION: cudnnFinalize Failed #66598

Closed
sylviahamidah opened this issue Apr 29, 2024 · 6 comments
Assignees
Labels
comp:xla XLA TF 2.15 For issues related to 2.15.x type:bug Bug

Comments

@sylviahamidah
Copy link

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

tf 2.15.0

Custom code

Yes

OS platform and distribution

Google Colaboratory

Mobile device

Google Colaboratory

Python version

3.10.12

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

12.2

GPU model and memory

No response

Current behavior?

When I try "model.fit()" server stops with error message below
CUDNN_STATUS_BAD_PARAM
in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(4175): 'op' CUDNN_BACKEND_OPERATION: cudnnFinalize Failed
[[{{node AlbuNet/conv1/Conv2D}}]] [Op:__inference_train_function_49877]

How can I solve the issue?
Thanks!

Standalone code to reproduce the issue

you can see my work here
https://colab.research.google.com/drive/1n97wV5cfgdQfWmngK5EBJjsYSjJzfi_d?usp=sharing

Relevant log output

No response

@SuryanarayanaY
Copy link
Collaborator

Hi @sylviahamidah ,

Could you please try with model.fit() instead of model.fit_generator() as it is deprecated. Also request you to share minimal code snippet if above change won't work for you. Thanks.

@SuryanarayanaY SuryanarayanaY added comp:xla XLA TF 2.15 For issues related to 2.15.x stat:awaiting response Status - Awaiting response from author labels Apr 29, 2024
@sylviahamidah
Copy link
Author

sylviahamidah commented Apr 29, 2024

Hi @SuryanarayanaY
I've tried model.fit() but the error is still the same.

Here I attached the minimal reproducible example, let me know if you need further infomation https://colab.research.google.com/drive/1FnwvnzkmeLYoaW42wkCKWw0j5xVPQ1Id?usp=sharing

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Apr 29, 2024
@SuryanarayanaY
Copy link
Collaborator

Hi @sylviahamidah ,

Could you please confirm the cuDNN version in your system. Please cross check the compatible versions from here.

Also, please use tensorflow[and-cuda] package which will install compatible cuda/cudnn libraries.

@SuryanarayanaY SuryanarayanaY added the stat:awaiting response Status - Awaiting response from author label Apr 30, 2024
@sylviahamidah
Copy link
Author

Sorry for my late response, Sir @SuryanarayanaY

My system version is written below.
Tensorflow: 2.15.0
Python: 3.11.7
CUDA version: 12.2
CUDNN version: 8

Also I've tried to install tensorflow[and-cuda] with code below but the error is still the same.
!pip install --extra-index-url https://pypi.nvidia.com tensorrt-bindings==8.6.1 tensorrt-libs==8.6.1
!pip install -U tensorflow[and-cuda]==2.15.0

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label May 5, 2024
@sylviahamidah
Copy link
Author

Update: After I installed pip install tensorflow[and-cuda], the error disappeared. Thank you for your assistance

Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:xla XLA TF 2.15 For issues related to 2.15.x type:bug Bug
Projects
None yet
Development

No branches or pull requests

2 participants