Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HorovodBasics load dynamic library make grpc create channel failed with tensorflow-2.11 #3963

Open
Lifann opened this issue Jul 18, 2023 · 1 comment · May be fixed by #3964
Open

HorovodBasics load dynamic library make grpc create channel failed with tensorflow-2.11 #3963

Lifann opened this issue Jul 18, 2023 · 1 comment · May be fixed by #3964
Labels

Comments

@Lifann
Copy link

Lifann commented Jul 18, 2023

Environment:

  1. Framework: (TensorFlow, Keras, PyTorch, MXNet) TensorFlow
  2. Framework version: tensorflow-2.11
  3. Horovod version: horovod-2.28.1
  4. MPI version: openmpi-4.1.2a1-1.54103.x86_64
  5. CUDA version: cuda-11.2
  6. NCCL version: nccl-2.18
  7. Python version: 3.8.12
  8. Spark / PySpark version: none
  9. Ray version: none
  10. OS and version: CentOS 7
  11. GCC version: 9.3.1
  12. CMake version: 3.18.5

Checklist:

  1. Did you search issues to find if somebody asked this question before? yes
  2. If your question is about hang, did you read this doc? yes
  3. If your question is about docker, did you read this doc? yes
  4. Did you check if you question is answered in the troubleshooting guide? yes

Bug report:

Phenomenon

I build a TensorFlow dataset which reading data with grpc client. And everything works fine with tensorflow-2.7.4. But after I upgrade tensorflow to 2.11, the grpc::CreateCustomChannel will fail.

Debug

I tried many horovod versions and nothing work.

Also I made some tests and found that the fail was caused by here. It seems that the problem is caused by ctypes.RTLD_GLOBAL. I change it to ctypes.RTLD_LOCAL and the grpc works fine. Is it ok to use ctypes.RTLD_LOCAL or ctypes.RTLD_GLOBAL is very necessary?

Hoping can get some help. Thx!

@Lifann
Copy link
Author

Lifann commented Jul 18, 2023

#3964 Solve the problem for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

1 participant