Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"/dev/termination-log must have noexec so unpriviledged user cannot exec from it" #122219

Open
pentago opened this issue Dec 7, 2023 · 15 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/security Categorizes an issue or PR as relevant to SIG Security. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@pentago
Copy link

pentago commented Dec 7, 2023

What happened?

An unprivileged process/user running inside of a pod is able to write to /dev/termination-log file.

I thought this was preventable with both pod/container securityContext but that didn't turn out to be the case.

I didn't test it till the end but I tried redirecting the output of /dev/urandom from within the container to /dev/termination-log and filled it with gibberish until it reached the size of 2GB.

I suspect this is a way to compromise/crash node.

Is this an exploitable scenario and what can be done to mount the termination log in a way that doesn't allow unprivileged user/process to write arbitrary stuff to it?

What did you expect to happen?

Permissions to be denied when attempting to write /dev/termination-log file.

How can we reproduce it (as minimally and precisely as possible)?

Exec pod container with maximally unprivileged securityContext and write /dev/termination-log file with:
cat /dev/urandom > /dev/termination-log

Anything else we need to know?

No response

Kubernetes version

Any

Cloud provider

N/A

OS version

Any

Install tools

N/A

Container runtime (CRI) and version (if applicable)

N/A

Related plugins (CNI, CSI, ...) and versions (if applicable)

N/A

@pentago pentago added the kind/bug Categorizes issue or PR as related to a bug. label Dec 7, 2023
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 7, 2023
@pentago
Copy link
Author

pentago commented Dec 7, 2023

/sig Security
/committee Security Response

@k8s-ci-robot k8s-ci-robot added the sig/security Categorizes an issue or PR as relevant to SIG Security. label Dec 7, 2023
@k8s-ci-robot
Copy link
Contributor

@pentago: The label(s) committee/security, committee/response cannot be applied, because the repository doesn't have them.

In response to this:

/sig Security
/committee Security Response

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Dec 7, 2023
@kannon92
Copy link
Contributor

kannon92 commented Dec 7, 2023

Maybe a duplicate of #81116

@kannon92
Copy link
Contributor

kannon92 commented Dec 8, 2023

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Dec 8, 2023
@SergeyKanzhelev SergeyKanzhelev added this to Triage in SIG Node Bugs Dec 11, 2023
@hlx-a1
Copy link

hlx-a1 commented Dec 12, 2023

To add some detail to this, I think /dev/termination-log/ has to be writable in order to capture container output, but what exacerbates the issue is that it's mounted without noexec like the rest of /dev which means it can be executed by an unprivileged user. The best case here, I think, is to mount it noexec and impose a user configurable limit on its size.

@SergeyKanzhelev
Copy link
Member

also another dup: #108076

/retitle "/dev/termination-log must have noexec so unpriviledged user cannot exec from it"

@k8s-ci-robot k8s-ci-robot changed the title /dev/termination-log is writable by unprivileged container user "/dev/termination-log must have noexec so unpriviledged user cannot exec from it" Jan 3, 2024
@SergeyKanzhelev
Copy link
Member

/triage accepted

with the scope of just noexec I think it is reasonable

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 3, 2024
@SergeyKanzhelev SergeyKanzhelev moved this from Triage to Triaged in SIG Node Bugs Jan 3, 2024
@SergeyKanzhelev
Copy link
Member

/priority backlog

@k8s-ci-robot k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jan 3, 2024
@mochizuki875
Copy link
Member

I have a question.
There are some file these mount conditions are similar to termination-log.
These are also bind-mounted to node files with rw and without noexec.
Is there a problem with this?

$ kubectl exec -it nginx -- mount
...
/dev/mapper/ubuntu--vg-ubuntu--lv on /etc/hosts type ext4 (rw,relatime)
/dev/mapper/ubuntu--vg-ubuntu--lv on /dev/termination-log type ext4 (rw,relatime)
/dev/mapper/ubuntu--vg-ubuntu--lv on /etc/hostname type ext4 (rw,relatime)
/dev/mapper/ubuntu--vg-ubuntu--lv on /etc/resolv.conf type ext4 (rw,relatime)
...

@hlx-a1
Copy link

hlx-a1 commented Jan 25, 2024

Maybe /etc/hosts, /etc/hostname and /etc/resolv.conf could maybe mounted ro instead of rw? Is there a case for writing to these mounts at runtime? I don't think there's a problem per-se, but maybe they can be mounted with lower privileges.

@mochizuki875
Copy link
Member

Maybe /etc/hosts, /etc/hostname and /etc/resolv.conf could maybe mounted ro instead of rw?

I thought that, but the result was that (these access mode were rw).
I used this manifest to deploy Pod so no securityContext is set.

nginx-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: nginx
  name: nginx
spec:
  containers:
  - image: nginx
    name: nginx

@anguslees
Copy link
Member

anguslees commented Mar 24, 2024

I suspect this is a way to compromise/crash node.

Just to dial back the alarm - unless someone can propose a mechanism for this, I think this is untrue. /dev/termination-log is just a writeable file, and the kubelet just reads opaque content from it and makes that available through the pod status. The contents of this file are not parsed, nor executed outside the container's security boundary.

I can imagine a DoS attack scenario where we fill up the host's disk. This is similar to writing to other ephemeral-storage (eg emptyDir, or your container's writeable top layer), and the solution is the same: ephemeral-storage quota and kubelet's existing full rootfs/imagefs recovery mechanisms.

@identw
Copy link

identw commented Apr 20, 2024

the solution is the same: ephemeral-storage quota and kubelet's existing full rootfs/imagefs recovery mechanisms.

Hi @anguslees
I attempted to work with the ephemeral-storage quota. I'm afraid this isn't a solution.

For instance, I deployed this Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: debug
  labels:
    app: debug
spec:
  replicas: 1
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: debug
  template:
    metadata:
      labels:
        app: debug
    spec:
      terminationGracePeriodSeconds: 5
      enableServiceLinks: false
      imagePullSecrets:
        - name: registry
      containers:
        - name: debug
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
              ephemeral-storage: 1Mi
            limits:
              cpu: 100m
              memory: 128Mi
              ephemeral-storage: 1Mi
          image: docker.io/alpine:latest
          command:
            - /bin/sleep
            - "infinity"
          workingDir: /app

If I run dd if=/dev/urandom of=/app/violation_quota bs=1M count=1024, then kubelet will evict this pod (error: Pod ephemeral local storage usage exceeds the total limit of containers 1Mi). This is the correct behavior.

However, when I run dd if=/dev/urandom of=/dev/termination-log bs=1M count=1024 kubelet doesn't take any action. Instead, I observe filled space on the worker node:

root@kube-worker133-1 ~ $ du -sh /var/lib/kubelet/pods/b7611457-55f2-4efa-8c6f-35315364f28d/containers/debug/449783f9 
1.1G	/var/lib/kubelet/pods/b7611457-55f2-4efa-8c6f-35315364f28d/containers/debug/449783f9
root@kube-worker133-1 ~ $ ls -la /var/lib/kubelet/pods/b7611457-55f2-4efa-8c6f-35315364f28d/containers/debug/449783f9
-rw-rw-rw- 1 root root 1073741824 Apr 20 07:50 /var/lib/kubelet/pods/b7611457-55f2-4efa-8c6f-35315364f28d/containers/debug/449783f9
root@kube-worker133-1 ~ $ file /var/lib/kubelet/pods/b7611457-55f2-4efa-8c6f-35315364f28d/containers/debug/449783f9
/var/lib/kubelet/pods/b7611457-55f2-4efa-8c6f-35315364f28d/containers/debug/449783f9: data

If I run dd if=/dev/urandom of=/dev/termination-log bs=1M (without the count), the node's disk will fill up to approximately 90% until the kubelet sends an eviction signal because the node needs to resolve the DiskPressure status. However, kubelet's GC will not clean the node until the pod is deleted manually or another solution is applied to remove evicted pods.

@kannon92
Copy link
Contributor

@identw thanks for the detailed investigation. Yes, if /dev/termination-log is not monitored by kubelet then GC won't remove it.

We have #122224 which if solved should solve the DDOS approach.

That issue is looking for an owner if you are interested.

@anguslees
Copy link
Member

anguslees commented Apr 26, 2024

Hi @anguslees
I attempted to work with the ephemeral-storage quota. I'm afraid this isn't a solution.

Right, I agree you've found a bug - and we should include this file in the ephemeral quota calculation. I was trying to say we don't need a radically new mechanism here, and the implication is a DoS not compromise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/security Categorizes an issue or PR as relevant to SIG Security. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

No branches or pull requests

8 participants