New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define a common Node autoscaling safe-to-evict/do-not-disrupt annotation #124800
base: master
Are you sure you want to change the base?
Conversation
Currently, there are 2 Node autoscalers sponsored by sig-autoscaling, each supporting a different Pod annotation with the same semantics: * Cluster Autoscaler: cluster-autoscaler.kubernetes.io/safe-to-evict=true/false * Karpenter: karpenter.sh/do-not-disrupt=true The semantics for cluster-autoscaler.kubernetes.io/safe-to-evict=false, and karpenter.sh/do-not-disrupt=true are identical. Both of these annotations will be replaced by node-autoscaling.kubernetes.io/safe-to-evict=false. cluster-autoscaler.kubernetes.io/safe-to-evict=true doesn't have an equivalent in Karpenter right now, as Karpenter doesn't have any pod-level conditions blocking consolidation. This means that the equivalent new annotation node-autoscaling.kubernetes.io/safe-to-evict=true should be trivially supported by Karpenter initially (but will require caution if Karpenter ever adds any pod-level conditions blocking consolidation). Going with the Cluster Autoscaler wording for the common annotation, as otherwise we'd have a double negation (do-not-disrupt=false) in the safe-to-evict=true case which doesn't seem ideal. This is a part of a broader alignment between Cluster Autoscaler and Karpenter. More details about the alignment can be found in https://docs.google.com/document/d/1rHhltfLV5V1kcnKr_mKRKDC4ZFPYGP4Tde2Zy-LE72w
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: towca The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I feel like we are a caught between a rock and a hard place with this semantic. I can see why this semantic works well for CAS, but now this semantic causes awkwardness for Karpenter users since there's currently no scenario where I need to do a bit more thinking on the trade-offs here between CAS and Karpenter supporting something common. |
So the only way for a pod to block consolidation of its node in Karpenter is for the user to explicitly opt that exact pod/workload into the blocking somehow? Or how does it work? Do you have/anticipate any such blocking config options that would span multiple workloads? If so,
|
@jonathan-innis Have you maybe had a chance to give this more thought? |
/sig node |
What type of PR is this?
/kind api-change
What this PR does / why we need it:
Currently, there are 2 Node autoscalers sponsored by sig-autoscaling, each supporting a different Pod annotation with the same semantics:
cluster-autoscaler.kubernetes.io/safe-to-evict=true/false
karpenter.sh/do-not-disrupt=true
The semantics for
cluster-autoscaler.kubernetes.io/safe-to-evict=false
, andkarpenter.sh/do-not-disrupt=true
are identical. Both of these annotations will be replaced bynode-autoscaling.kubernetes.io/safe-to-evict=false
.cluster-autoscaler.kubernetes.io/safe-to-evict=true
doesn't have an equivalent in Karpenter right now, as Karpenter doesn't have any pod-level conditions blocking consolidation. This means that the equivalent new annotationnode-autoscaling.kubernetes.io/safe-to-evict=true
should be trivially supported by Karpenter initially (but will require caution if Karpenter ever adds any pod-level conditions blocking consolidation).Going with the Cluster Autoscaler wording for the common annotation, as otherwise we'd have a double negation (
do-not-disrupt=false
) in thesafe-to-evict=true
case which doesn't seem ideal.This is a part of a broader alignment between Cluster Autoscaler and Karpenter. More details about the alignment can be found in https://docs.google.com/document/d/1rHhltfLV5V1kcnKr_mKRKDC4ZFPYGP4Tde2Zy-LE72w
Which issue(s) this PR fixes:
Part of kubernetes/autoscaler#6648
Special notes for your reviewer:
The implementation in Cluster Autoscaler and Karpenter will follow this PR. If this is a problem, I could do the implementation first with the annotation hardcoded, then submit this PR, then clean up the implementation to use the annotation from the API.
@jonathan-innis this PR goes with the Cluster Autoscaler
safe-to-evict
wording for now, instead of the Karpenterdo-not-disrupt
one.do-not-disrupt
would have to be negated to expresssafe-to-evict=true
, which would result in a double negation. Would switching tosafe-to-evict
be a problem for Karpenter?Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/assign @jonathan-innis
/assign @MaciekPytel
/assign @gjtempleton
/hold
Want LGTMs from the Node autoscaling stakeholders above before unholding.