Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permissions for the /dev/{kfd,dri/renderXXXX} devices in containers #39

Open
elukey opened this issue Apr 18, 2023 · 1 comment
Open

Comments

@elukey
Copy link

elukey commented Apr 18, 2023

Hi folks!

I am trying the AMD device plugin on my system, deployed as Systemd unit on Debian 11 (so not a DaemonSet, but directly on the K8s node). Everything works fine and I am able to see two devices in my test container:

  • /dev/kfd
  • /dev/dri/renderD128

I am trying to run the container with an unpriviledged user, like nobody, but I am struggling to assign the proper permissions to the above devices. In the container I see something like the following (tested via nsenter):

root@alexnet-tf-gpu-pod:/# ls -l /dev/kfd 
crw-rw---- 1 root 106 242, 0 Apr 18 15:58 /dev/kfd

root@alexnet-tf-gpu-pod:/# ls -l /dev/dri/renderD128 
crw-rw---- 1 root 106 226, 128 Apr 18 15:58 /dev/dri/renderD128

The gid 106 is the render group on the underlying "bare metal" K8s worker OS, that gets mapped to the test container, but in this way I don't have a clear way to add nobody to render or similar (in the Docker image). Is there a best practice that you can suggest?

Thanks in advance!

wmfgerrit pushed a commit to wikimedia/operations-puppet that referenced this issue Apr 20, 2023
On k8s nodes we need to be able to bypass the restriction
on GPU related devices (/dev/kfd, /dev/dri/renderXXXX) set
for root:render, see
ROCm/k8s-device-plugin#39

We don't need anymore to vary the kfd access policies, so it seems
good to transform the option into something more flexible for
a broader range of use cases.

Bug: T333009
Change-Id: Idab004a1a725b1223d4ee36d2d0d900c329140f9
@sdwilsh
Copy link

sdwilsh commented Apr 11, 2024

In the securityContext for the pod, you can add supplementalGroups that the pod is run as, which I found enabled me to use the hardware.

https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.29/#podsecuritycontext-v1-core

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants