Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Support building source against CUDA 12.1 #21190

Open
kevnzhao opened this issue Mar 27, 2023 · 5 comments
Open

Support building source against CUDA 12.1 #21190

kevnzhao opened this issue Mar 27, 2023 · 5 comments

Comments

@kevnzhao
Copy link

Description

(A clear and concise description of what the bug is.)

CUDA Toolkit 12.x is released last month. This is a major version so there are API breaking changes.
When building MXNET against CUDA 12.1, the build failed. Error message is pasted in below section.

Error Message

(Paste the complete error message. Please also include stack trace by setting environment variable DMLC_LOG_STACK_TRACE_DEPTH=100 before running your script.)

src/api/operator/numpy/../../../imperative/../executor/cuda_graphs.h: In member function 'void mxnet::cuda_graphs::CudaGraphsSubSegExec::Update(const std::vector<std::shared_ptr<mxnet::exec::OpExecutor> >&, const mxnet::RunContext&, bool, bool)':
--
965 | src/api/operator/numpy/../../../imperative/../executor/cuda_graphs.h:197:62: error: cannot convert 'CUgraphNode_st**' to 'cudaGraphExecUpdateResultInfo* {aka cudaGraphExecUpdateResultInfo_st*}' for argument '3' to 'cudaError_t cudaGraphExecUpdate(cudaGraphExec_t, cudaGraph_t, cudaGraphExecUpdateResultInfo*)'
966 | &error_node, &update_result));
967 | ^
968 | src/api/operator/numpy/../../../imperative/../executor/../common/cuda_utils.h:99:22: note: in definition of macro 'CUDA_CALL'
969 | cudaError_t e = (func);                                        \
970 | ^~~~
971 | In file included from src/api/operator/numpy/../../../imperative/../executor/cuda_graphs.h:34:0,
972 | from src/api/operator/numpy/../../../imperative/imperative_utils.h:29,
973 | from src/api/operator/numpy/../utils.h:34,
974 | from src/api/operator/numpy/np_tensordot_op.cc:24:
975 | src/api/operator/numpy/../../../imperative/../executor/cuda_graphs.h: In member function 'void mxnet::cuda_graphs::CudaGraphsSubSegExec::Update(const std::vector<std::shared_ptr<mxnet::exec::OpExecutor> >&, const mxnet::RunContext&, bool, bool)':
976 | src/api/operator/numpy/../../../imperative/../executor/cuda_graphs.h:197:62: error: cannot convert 'CUgraphNode_st**' to 'cudaGraphExecUpdateResultInfo* {aka cudaGraphExecUpdateResultInfo_st*}' for argument '3' to 'cudaError_t cudaGraphExecUpdate(cudaGraphExec_t, cudaGraph_t, cudaGraphExecUpdateResultInfo*)'
977 | &error_node, &update_result));
978 | ^
979 | src/api/operator/numpy/../../../imperative/../executor/../common/cuda_utils.h:99:22: note: in definition of macro 'CUDA_CALL'
980 | cudaError_t e = (func);                                        \
981 | ^~~~


To Reproduce

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. Install CUDA Toolkit 12.1 in the build machine.

  2. Build with below commands.

    export mxnet_variant=CU${CUDA_VERSION}
    make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1 ADD_CFLAGS=-I/usr/include/openblas ADD_LDFLAGS=-L/usr/lib64/lib",
    

What have you tried to solve it?

This looks like caused below API breaking change. Code changes are needed to support CUDA 12.x.

image

Environment

We recommend using our script for collecting the diagnostic information with the following command
curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python3

Environment Information
# Paste the diagnose.py command output here
@github-actions
Copy link

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue.
Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly.
If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

@sl1pkn07
Copy link
Contributor

sl1pkn07 commented May 3, 2023

Hi

same here

[  4%] Building CXX object CMakeFiles/mxnet.dir/src/api/cached_op_api.cc.o
/usr/bin/c++ -DDMLC_CORE_USE_CMAKE -DDMLC_LOG_FATAL_THROW=1 -DDMLC_LOG_STACK_TRACE_SIZE=0 -DDMLC_MODERN_THREAD_LOCAL=0 -DDMLC_STRICT_CXX11 -DDMLC_USE_CXX11 -DDMLC_USE_CXX11=1 -DDMLC_USE_CXX14 -DMSHADOW_FORCE_STREAM -DMSHADOW_INT64_TENSOR_SIZE=1 -DMSHADOW_IN_CXX11 -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_CUDA=1 -DMSHADOW_USE_CUDNN -DMSHADOW_USE_CUTENSOR -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_SSE -DMXNET_BRANCH=\"master\" -DMXNET_COMMIT_HASH=\"b84609d3fc73d20929c114eab95faaa56e6c5ede\" -DMXNET_USE_BLAS_OPEN=1 -DMXNET_USE_CUDA=1 -DMXNET_USE_INTGEMM=1 -DMXNET_USE_LAPACK=1 -DMXNET_USE_LAPACKE_INTERFACE=1 -DMXNET_USE_LIBJPEG_TURBO=1 -DMXNET_USE_NCCL=1 -DMXNET_USE_NVTX=1 -DMXNET_USE_OPENCV=1 -DMXNET_USE_OPENMP=1 -DMXNET_USE_OPERATOR_TUNING=1 -DMXNET_USE_SIGNAL_HANDLER=1 -DNDEBUG=1 -DUSE_CUDNN -DUSE_CUTENSOR -D__USE_XOPEN2K8 -Dmxnet_EXPORTS -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/src -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/tvm/nnvm/include -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/tvm/include -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/dmlc-core/include -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/dlpack/include -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/mshadow -I/tmp/makepkg/sl1-mxnet-git/src/build/3rdparty/intgemm -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/intgemm -I/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/3rdparty/miniz -I/tmp/makepkg/sl1-mxnet-git/src/build/3rdparty/dmlc-core/include -isystem /usr/include/opencv4 -isystem /opt/cuda/include -march=native -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -Wp,-D_GLIBCXX_ASSERTIONS -fdiagnostics-color=always -march=native -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -Wall -Wno-sign-compare -O3 -fopenmp -O3 -DNDEBUG -std=gnu++17 -fPIC -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -mf16c -fopenmp -MD -MT CMakeFiles/mxnet.dir/src/api/cached_op_api.cc.o -MF CMakeFiles/mxnet.dir/src/api/cached_op_api.cc.o.d -o CMakeFiles/mxnet.dir/src/api/cached_op_api.cc.o -c /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/src/api/cached_op_api.cc
In file included from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/src/api/../imperative/./imperative_utils.h:31,
                 from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/src/api/../imperative/cached_op.h:34,
                 from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/src/api/cached_op_api.cc:27:
/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/src/api/../imperative/././cuda_graphs.h: In member function 'void mxnet::cuda_graphs::CudaGraphsSubSegExec::Update(const std::vector<std::shared_ptr<mxnet::exec::OpExecutor> >&, const mxnet::RunContext&, bool, bool)':
/tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/src/api/../imperative/././cuda_graphs.h:205:62: error: cannot convert 'CUgraphNode_st**' to 'cudaGraphExecUpdateResultInfo*' {aka 'cudaGraphExecUpdateResultInfo_st*'}
  205 |         cudaGraphExecUpdate(graph_exec_.get(), graph_.get(), &error_node, &update_result);
      |                                                              ^~~~~~~~~~~
      |                                                              |
      |                                                              CUgraphNode_st**
In file included from /opt/cuda/include/channel_descriptor.h:61,
                 from /opt/cuda/include/cuda_runtime.h:95,
                 from /opt/cuda/include/curand.h:59,
                 from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include/mshadow/./base.h:195,
                 from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include/mshadow/tensor.h:34,
                 from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include/mxnet/./base.h:32,
                 from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include/mxnet/ndarray.h:39,
                 from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include/mxnet/runtime/ndarray_handle.h:26,
                 from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include/mxnet/runtime/packed_func.h:34,
                 from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include/mxnet/runtime/registry.h:49,
                 from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/include/mxnet/api_registry.h:31,
                 from /tmp/makepkg/sl1-mxnet-git/src/incubator-mxnet/src/api/cached_op_api.cc:24:
/opt/cuda/include/cuda_runtime_api.h:12423:138: note:   initializing argument 3 of 'cudaError_t cudaGraphExecUpdate(cudaGraphExec_t, cudaGraph_t, cudaGraphExecUpdateResultInfo*)'
12423 | extern __host__ cudaError_t CUDARTAPI cudaGraphExecUpdate(cudaGraphExec_t hGraphExec, cudaGraph_t hGraph, cudaGraphExecUpdateResultInfo *resultInfo);
      |                                                                                                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
make[2]: *** [CMakeFiles/mxnet.dir/build.make:90: CMakeFiles/mxnet.dir/src/api/cached_op_api.cc.o] Error 1

@voycey
Copy link

voycey commented Jun 20, 2023

This is blocking some work we are trying to do, downgrading to CUDA 10 isnt possible - any idea when MX net for CUDA 12 will be available?

@abhiksark
Copy link

I'm facing the same issue as @kevnzhao

Using cuda 12 version.

/app/mxnet/src/api/../imperative/././cuda_graphs.h: In member function 'void mxnet::cuda_graphs::CudaGraphsSubSegExec::Update(const std::vector<std::shared_ptr<mxnet::exec::OpExecutor> >&, const mxnet::RunContext&, bool, bool)':
/app/mxnet/src/api/../imperative/././cuda_graphs.h:205:62: error: cannot convert 'CUgraphNode_st**' to 'cudaGraphExecUpdateResultInfo*' {aka 'cudaGraphExecUpdateResultInfo_st*'}
  205 |         cudaGraphExecUpdate(graph_exec_.get(), graph_.get(), &error_node, &update_result);
      |                                                              ^~~~~~~~~~~
      |                                                              |
      |                                                              CUgraphNode_st**
In file included from /usr/local/cuda/include/channel_descriptor.h:61,
                 from /usr/local/cuda/include/cuda_runtime.h:95,
                 from /usr/local/cuda/include/curand.h:59,
                 from /app/mxnet/include/mshadow/./base.h:195,
                 from /app/mxnet/include/mshadow/tensor.h:34,
                 from /app/mxnet/include/mxnet/./base.h:32,
                 from /app/mxnet/include/mxnet/ndarray.h:39,
                 from /app/mxnet/include/mxnet/runtime/ndarray_handle.h:26,
                 from /app/mxnet/include/mxnet/runtime/packed_func.h:34,
                 from /app/mxnet/include/mxnet/runtime/registry.h:49,
                 from /app/mxnet/include/mxnet/api_registry.h:31,
                 from /app/mxnet/src/api/cached_op_api.cc:24:
/usr/local/cuda/include/cuda_runtime_api.h:12402:138: note:   initializing argument 3 of 'cudaError_t cudaGraphExecUpdate(cudaGraphExec_t, cudaGraph_t, cudaGraphExecUpdateResultInfo*)'
12402 | __host__ cudaError_t CUDARTAPI cudaGraphExecUpdate(cudaGraphExec_t hGraphExec, cudaGraph_t hGraph, cudaGraphExecUpdateResultInfo *resultInfo);
      |                                                                                                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~

make[2]: *** [CMakeFiles/mxnet.dir/build.make:76: CMakeFiles/mxnet.dir/src/api/cached_op_api.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:465: CMakeFiles/mxnet.dir/all] Error 2
make: *** [Makefile:141: all] Error 2

@TristonC
Copy link
Contributor

@DickJC123

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants