Define a common Node autoscaling safe-to-evict/do-not-disrupt annotation #124800

towca · 2024-05-10T15:32:53Z

What type of PR is this?

/kind api-change

What this PR does / why we need it:

Currently, there are 2 Node autoscalers sponsored by sig-autoscaling, each supporting a different Pod annotation with the same semantics:

Cluster Autoscaler: cluster-autoscaler.kubernetes.io/safe-to-evict=true/false
Karpenter: karpenter.sh/do-not-disrupt=true

The semantics for cluster-autoscaler.kubernetes.io/safe-to-evict=false, and karpenter.sh/do-not-disrupt=true are identical. Both of these annotations will be replaced by node-autoscaling.kubernetes.io/safe-to-evict=false.

cluster-autoscaler.kubernetes.io/safe-to-evict=true doesn't have an equivalent in Karpenter right now, as Karpenter doesn't have any pod-level conditions blocking consolidation. This means that the equivalent new annotation
node-autoscaling.kubernetes.io/safe-to-evict=true should be trivially supported by Karpenter initially (but will require caution if Karpenter ever adds any pod-level conditions blocking consolidation).

Going with the Cluster Autoscaler wording for the common annotation, as otherwise we'd have a double negation (do-not-disrupt=false) in the safe-to-evict=true case which doesn't seem ideal.

This is a part of a broader alignment between Cluster Autoscaler and Karpenter. More details about the alignment can be found in https://docs.google.com/document/d/1rHhltfLV5V1kcnKr_mKRKDC4ZFPYGP4Tde2Zy-LE72w

Which issue(s) this PR fixes:

Part of kubernetes/autoscaler#6648

Special notes for your reviewer:

The implementation in Cluster Autoscaler and Karpenter will follow this PR. If this is a problem, I could do the implementation first with the annotation hardcoded, then submit this PR, then clean up the implementation to use the annotation from the API.

@jonathan-innis this PR goes with the Cluster Autoscaler safe-to-evict wording for now, instead of the Karpenter do-not-disrupt one. do-not-disrupt would have to be negated to express safe-to-evict=true, which would result in a double negation. Would switching to safe-to-evict be a problem for Karpenter?

Does this PR introduce a user-facing change?

A new Pod annotation node-autoscaling.kubernetes.io/safe-to-evict is introduced. The annotation can be used to control Node autoscaler drain behavior. Value "true" means that a Pod is safe to evict, and Node autoscalers should not block consolidation of a Node because of it, when they normally would. Value "false" means that a Pod is not safe to evict, and Node autoscalers shouldn't consolidate a Node where such a pod is present. The annotation is supported by
Cluster Autoscaler and Karpenter. The annotation is equivalent to autoscaler-specific cluster-autoscaler.kubernetes.io/safe-to-evict and karpenter.sh/do-not-disrupt annotations. The autoscaler-specific annotations are deprecated, and will be removed in a future release.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [Cluster Autoscaler/Karpenter alignment doc]: https://docs.google.com/document/d/1rHhltfLV5V1kcnKr_mKRKDC4ZFPYGP4Tde2Zy-LE72w/edit?usp=sharing

/assign @jonathan-innis
/assign @MaciekPytel
/assign @gjtempleton
/hold
Want LGTMs from the Node autoscaling stakeholders above before unholding.

Currently, there are 2 Node autoscalers sponsored by sig-autoscaling, each supporting a different Pod annotation with the same semantics: * Cluster Autoscaler: cluster-autoscaler.kubernetes.io/safe-to-evict=true/false * Karpenter: karpenter.sh/do-not-disrupt=true The semantics for cluster-autoscaler.kubernetes.io/safe-to-evict=false, and karpenter.sh/do-not-disrupt=true are identical. Both of these annotations will be replaced by node-autoscaling.kubernetes.io/safe-to-evict=false. cluster-autoscaler.kubernetes.io/safe-to-evict=true doesn't have an equivalent in Karpenter right now, as Karpenter doesn't have any pod-level conditions blocking consolidation. This means that the equivalent new annotation node-autoscaling.kubernetes.io/safe-to-evict=true should be trivially supported by Karpenter initially (but will require caution if Karpenter ever adds any pod-level conditions blocking consolidation). Going with the Cluster Autoscaler wording for the common annotation, as otherwise we'd have a double negation (do-not-disrupt=false) in the safe-to-evict=true case which doesn't seem ideal. This is a part of a broader alignment between Cluster Autoscaler and Karpenter. More details about the alignment can be found in https://docs.google.com/document/d/1rHhltfLV5V1kcnKr_mKRKDC4ZFPYGP4Tde2Zy-LE72w

k8s-ci-robot · 2024-05-10T15:33:01Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2024-05-10T15:33:21Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: towca
Once this PR has been reviewed and has the lgtm label, please assign msau42 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

staging/src/k8s.io/api/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jonathan-innis · 2024-05-10T15:54:13Z

node-autoscaling.kubernetes.io/safe-to-evict=true

I feel like we are a caught between a rock and a hard place with this semantic. I can see why this semantic works well for CAS, but now this semantic causes awkwardness for Karpenter users since there's currently no scenario where node-autoscaling.kubernetes.io/safe-to-evict=true would apply, since all pods are safe to evict by default (and we layer blocking elements on top of it). Effectively, all Karpenter users would always be doing node-autoscaling.kubernetes.io/safe-to-evict=false but I think the semantic is a little awkward, since having a boolean semantic like this kind of implies that you support the "truthy" value.

I need to do a bit more thinking on the trade-offs here between CAS and Karpenter supporting something common.

towca · 2024-05-13T17:42:30Z

@jonathan-innis

since all pods are safe to evict by default (and we layer blocking elements on top of it)

So the only way for a pod to block consolidation of its node in Karpenter is for the user to explicitly opt that exact pod/workload into the blocking somehow? Or how does it work?

Do you have/anticipate any such blocking config options that would span multiple workloads? If so, safe-to-evict: true is still useful for "exceptions", something like:

I want to configure X workloads together because they mostly have the same requirements/have to run together/etc.
The behavior I want for most of the workloads is to block consolidation on some conditions - e.g. if a pod uses local storage.
For some of the workloads, I know that the local storage they use is safe to lose, so I can annotate them with safe-to-evict: true.

towca · 2024-05-20T17:01:14Z

@jonathan-innis Have you maybe had a chance to give this more thought?

dims · 2024-05-22T13:39:59Z

/sig node
/sig autoscaling

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 10, 2024

k8s-ci-robot assigned gjtempleton May 10, 2024

k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label May 10, 2024

k8s-ci-robot assigned jonathan-innis May 10, 2024

k8s-ci-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 10, 2024

k8s-ci-robot assigned MaciekPytel May 10, 2024

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 10, 2024

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 10, 2024

k8s-ci-robot requested review from cici37 and mwielgus May 10, 2024 15:33

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define a common Node autoscaling safe-to-evict/do-not-disrupt annotation #124800

Define a common Node autoscaling safe-to-evict/do-not-disrupt annotation #124800

towca commented May 10, 2024

k8s-ci-robot commented May 10, 2024

k8s-ci-robot commented May 10, 2024

jonathan-innis commented May 10, 2024 •

edited

towca commented May 13, 2024

towca commented May 20, 2024

dims commented May 22, 2024

Define a common Node autoscaling safe-to-evict/do-not-disrupt annotation #124800

Are you sure you want to change the base?

Define a common Node autoscaling safe-to-evict/do-not-disrupt annotation #124800

Conversation

towca commented May 10, 2024

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented May 10, 2024

k8s-ci-robot commented May 10, 2024

jonathan-innis commented May 10, 2024 • edited

towca commented May 13, 2024

towca commented May 20, 2024

dims commented May 22, 2024

jonathan-innis commented May 10, 2024 •

edited