Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a common Node autoscaling safe-to-evict/do-not-disrupt annotation #124800

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

towca
Copy link
Contributor

@towca towca commented May 10, 2024

What type of PR is this?

/kind api-change

What this PR does / why we need it:

Currently, there are 2 Node autoscalers sponsored by sig-autoscaling, each supporting a different Pod annotation with the same semantics:

  • Cluster Autoscaler: cluster-autoscaler.kubernetes.io/safe-to-evict=true/false
  • Karpenter: karpenter.sh/do-not-disrupt=true

The semantics for cluster-autoscaler.kubernetes.io/safe-to-evict=false, and karpenter.sh/do-not-disrupt=true are identical. Both of these annotations will be replaced by node-autoscaling.kubernetes.io/safe-to-evict=false.

cluster-autoscaler.kubernetes.io/safe-to-evict=true doesn't have an equivalent in Karpenter right now, as Karpenter doesn't have any pod-level conditions blocking consolidation. This means that the equivalent new annotation
node-autoscaling.kubernetes.io/safe-to-evict=true should be trivially supported by Karpenter initially (but will require caution if Karpenter ever adds any pod-level conditions blocking consolidation).

Going with the Cluster Autoscaler wording for the common annotation, as otherwise we'd have a double negation (do-not-disrupt=false) in the safe-to-evict=true case which doesn't seem ideal.

This is a part of a broader alignment between Cluster Autoscaler and Karpenter. More details about the alignment can be found in https://docs.google.com/document/d/1rHhltfLV5V1kcnKr_mKRKDC4ZFPYGP4Tde2Zy-LE72w

Which issue(s) this PR fixes:

Part of kubernetes/autoscaler#6648

Special notes for your reviewer:

The implementation in Cluster Autoscaler and Karpenter will follow this PR. If this is a problem, I could do the implementation first with the annotation hardcoded, then submit this PR, then clean up the implementation to use the annotation from the API.

@jonathan-innis this PR goes with the Cluster Autoscaler safe-to-evict wording for now, instead of the Karpenter do-not-disrupt one. do-not-disrupt would have to be negated to express safe-to-evict=true, which would result in a double negation. Would switching to safe-to-evict be a problem for Karpenter?

Does this PR introduce a user-facing change?

A new Pod annotation node-autoscaling.kubernetes.io/safe-to-evict is introduced. The annotation can be used to control Node autoscaler drain behavior. Value "true" means that a Pod is safe to evict, and Node autoscalers should not block consolidation of a Node because of it, when they normally would. Value "false" means that a Pod is not safe to evict, and Node autoscalers shouldn't consolidate a Node where such a pod is present. The annotation is supported by
Cluster Autoscaler and Karpenter. The annotation is equivalent to autoscaler-specific cluster-autoscaler.kubernetes.io/safe-to-evict and karpenter.sh/do-not-disrupt annotations. The autoscaler-specific annotations are deprecated, and will be removed in a future release.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [Cluster Autoscaler/Karpenter alignment doc]: https://docs.google.com/document/d/1rHhltfLV5V1kcnKr_mKRKDC4ZFPYGP4Tde2Zy-LE72w/edit?usp=sharing

/assign @jonathan-innis
/assign @MaciekPytel
/assign @gjtempleton
/hold
Want LGTMs from the Node autoscaling stakeholders above before unholding.

Currently, there are 2 Node autoscalers sponsored by sig-autoscaling,
each supporting a different Pod annotation with the same semantics:

* Cluster Autoscaler:
  cluster-autoscaler.kubernetes.io/safe-to-evict=true/false
* Karpenter: karpenter.sh/do-not-disrupt=true

The semantics for cluster-autoscaler.kubernetes.io/safe-to-evict=false,
and karpenter.sh/do-not-disrupt=true are identical. Both of these
annotations will be replaced by
node-autoscaling.kubernetes.io/safe-to-evict=false.

cluster-autoscaler.kubernetes.io/safe-to-evict=true doesn't have an
equivalent in Karpenter right now, as Karpenter doesn't have any
pod-level conditions blocking consolidation. This means that the
equivalent new annotation
node-autoscaling.kubernetes.io/safe-to-evict=true should be trivially
supported by Karpenter initially (but will require caution if Karpenter
ever adds any pod-level conditions blocking consolidation).

Going with the Cluster Autoscaler wording for the common annotation, as
otherwise we'd have a double negation (do-not-disrupt=false) in the
safe-to-evict=true case which doesn't seem ideal.

This is a part of a broader alignment between Cluster Autoscaler
and Karpenter. More details about the alignment can be found in
https://docs.google.com/document/d/1rHhltfLV5V1kcnKr_mKRKDC4ZFPYGP4Tde2Zy-LE72w
@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 10, 2024
@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label May 10, 2024
@k8s-ci-robot k8s-ci-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 10, 2024
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 10, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 10, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: towca
Once this PR has been reviewed and has the lgtm label, please assign msau42 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jonathan-innis
Copy link

jonathan-innis commented May 10, 2024

node-autoscaling.kubernetes.io/safe-to-evict=true

I feel like we are a caught between a rock and a hard place with this semantic. I can see why this semantic works well for CAS, but now this semantic causes awkwardness for Karpenter users since there's currently no scenario where node-autoscaling.kubernetes.io/safe-to-evict=true would apply, since all pods are safe to evict by default (and we layer blocking elements on top of it). Effectively, all Karpenter users would always be doing node-autoscaling.kubernetes.io/safe-to-evict=false but I think the semantic is a little awkward, since having a boolean semantic like this kind of implies that you support the "truthy" value.

I need to do a bit more thinking on the trade-offs here between CAS and Karpenter supporting something common.

@towca
Copy link
Contributor Author

towca commented May 13, 2024

@jonathan-innis

since all pods are safe to evict by default (and we layer blocking elements on top of it)

So the only way for a pod to block consolidation of its node in Karpenter is for the user to explicitly opt that exact pod/workload into the blocking somehow? Or how does it work?

Do you have/anticipate any such blocking config options that would span multiple workloads? If so, safe-to-evict: true is still useful for "exceptions", something like:

  • I want to configure X workloads together because they mostly have the same requirements/have to run together/etc.
  • The behavior I want for most of the workloads is to block consolidation on some conditions - e.g. if a pod uses local storage.
  • For some of the workloads, I know that the local storage they use is safe to lose, so I can annotate them with safe-to-evict: true.

@towca
Copy link
Contributor Author

towca commented May 20, 2024

@jonathan-innis Have you maybe had a chance to give this more thought?

@dims
Copy link
Member

dims commented May 22, 2024

/sig node
/sig autoscaling

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants