Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate CUDA imports to new variants in nixpkgs #61

Open
Tracked by #86
aaronmondal opened this issue Apr 6, 2023 · 4 comments
Open
Tracked by #86

Migrate CUDA imports to new variants in nixpkgs #61

aaronmondal opened this issue Apr 6, 2023 · 4 comments
Assignees
Labels
dependencies Pull requests that update a dependency file

Comments

@aaronmondal
Copy link
Contributor

NixOS/nixpkgs#224646 (comment) mentioned that the way we currently import CUDA from nix is outdated. We should change imports from the outdated

pkgs.cudaPackages.cudatoolkit

to

cudaPackages.{lib,cuda_foo}

@JannisFengler @SpamDoodler This might make WSL compatibility work.

@SomeoneSerge
Copy link

I'm not familiar with WSL, but from a brief search they seem to be deploying their libcuda.so in /usr/lib/wsl/lib:

Note that if there are libraries in /usr/lib/wsl/lib other than libcuda.so/libnvidia-ml.so/etc (ones that NixOS deploys impurely), using LD_LIBRARY_PATH might result in conflicts, when /usr/lib/wsl/lib takes priority over dependencies recorded in the Runpaths by Nix

@aaronmondal
Copy link
Contributor Author

Note that if there are libraries in /usr/lib/wsl/lib other than libcuda.so/libnvidia-ml.so/etc (ones that NixOS deploys impurely), using LD_LIBRARY_PATH might result in conflicts, when /usr/lib/wsl/lib takes priority over dependencies recorded in the Runpaths by Nix

We've encountered this before in #21. ATM we're advising against using /usr/...-style paths in the external dependency guide. Maybe symlinking the cuda-related paths to another directory and setting that via ldconfig would make sense.

I'd like to avoid it, but it might be necessary to check for the existence of WSL in our flake setup and explicitly set -l:libsomething.so and corresponding rpaths only for the *_nvptx toolchains.

Another way I could think of is symlinking only the impure libraries we actually need to another directory which we can when add to search paths via the LL_CUDA_* flags.

None of these options seem optimal to me though.

@SomeoneSerge
Copy link

Yes, you'd only need to symlink the libraries that version lock the driver: libcuda.so, libnvidia-ml.so... Adding that location into LD_LIBRARY_PATH should work, in principle. I haven'theard of LL_CUDA... before, is it WSL specific?

@aaronmondal
Copy link
Contributor Author

Ahh sorry for the confusion no that's just our rules_ll-specific way of getting nix deps into bazel builds. We use something like this in our flake:

rules_ll/flake.nix

Lines 110 to 115 in 1354042

'' + (if unfree then ''
# Flags for CUDA dependencies.
LL_CUDA_TOOLKIT=${pkgsUnfree.cudaPackages_12.cudatoolkit}
LL_CUDA_RUNTIME=${pkgsUnfree.cudaPackages_12.cudatoolkit.lib}
LL_CUDA_DRIVER=${pkgsUnfree.linuxPackages_6_1.nvidia_x11}
'' else "") + ''

Which is then fed to the compilation sandboxes in Bazel:

rules_ll/flake.nix

Lines 121 to 142 in 1354042

if [[
"$1" == "build" ||
"$1" == "coverage" ||
"$1" == "run" ||
"$1" == "test"
]]; then
bazelisk $1 \
--action_env=LL_CFLAGS=$LL_CFLAGS \
--action_env=LL_LDFLAGS=$LL_LDFLAGS \
--action_env=LL_DYNAMIC_LINKER=$LL_DYNAMIC_LINKER \
--action_env=LL_AMD_INCLUDES=$LL_AMD_INCLUDES \
--action_env=LL_AMD_LIBRARIES=$LL_AMD_LIBRARIES \
--action_env=LL_AMD_RPATHS=$LL_AMD_RPATHS \
--action_env=LL_CUDA_TOOLKIT=$LL_CUDA_TOOLKIT \
--action_env=LL_CUDA_RUNTIME=$LL_CUDA_RUNTIME \
--action_env=LL_CUDA_DRIVER=$LL_CUDA_DRIVER \
--action_env=BAZEL_CXXOPTS=$BAZEL_CXXOPTS \
--action_env=BAZEL_LINKOPTS=$BAZEL_LINKOPTS \
''${@:2}
else
bazelisk $@
fi

Then it's consumed by some compilation paths:

rules_ll/ll/args.bzl

Lines 229 to 239 in 1354042

if ctx.attr.compilation_mode in [
"cuda_nvptx",
"hip_nvptx",
]:
args.add("-Wno-unknown-cuda-version") # Will always be unknown.
args.add("-xcuda")
if ctx.configuration.default_shell_env.get("LL_CUDA_TOOLKIT") != "":
args.add(
ctx.configuration.default_shell_env["LL_CUDA_TOOLKIT"],
format = "--cuda-path=%s",
)

And some link actions:

rules_ll/ll/args.bzl

Lines 492 to 509 in 1354042

if ctx.attr.compilation_mode in [
"cuda_nvptx",
"hip_nvptx",
]:
for location in ["LL_CUDA_TOOLKIT", "LL_CUDA_RUNTIME", "LL_CUDA_DRIVER"]:
if ctx.configuration.default_shell_env.get(location) != "":
args.add(
ctx.configuration.default_shell_env[location],
format = "-rpath=%s/lib",
)
args.add(
ctx.configuration.default_shell_env[location],
format = "-L%s/lib",
)
args.add("-lcuda")
args.add("-lcudart_static")
args.add("-lcupti_static")

Bazel doesn't track dependencies outside of its build graph. We had explicit bazel-only imports before which mapped out all the files, but that felt too hacky and fragile to maintain. So at some point we decided to kick out that logic and just import it from the way easier manageable nix env 😄

We are actually doing the opposite for ROCm. There we have access to the source code and we have ported the ROCm/HIP build because that lets us build everything with our own C++-only toolchains for later consumption by *_amdgpu toolchains.

@aaronmondal aaronmondal added enhancement New feature or request dependencies Pull requests that update a dependency file and removed enhancement New feature or request labels Apr 11, 2023
@aaronmondal aaronmondal mentioned this issue Apr 22, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

No branches or pull requests

3 participants