Ensure that pods are scheduled to nodes that meet preferred conditions, while satisfying a series of filter plugins for the scheduler. #124844

fanhaouu · 2024-05-13T08:53:33Z

What would you like to be added?

/sig scheduling
/kind feature

Add a new plugin extension to check nodes. Then modify the scheduling filter logic to prioritize nodes that satisfy preferred check conditions, ensuring that these nodes are placed at the beginning of the node array to ensure that the scheduler prioritizes them during each scheduling attempt.

If the community feels this requirement is necessary, I will complete the corresponding KEP and code implementation work.

The current solution within our company is like this, but I believe adding a 'check preferred' extension point would be better：
1、Enable users to assign a specific annotation to pods with the key "xxx.k8s.io/preferred-plugin". The value of this annotation can be either "NodeAffinity" or "TaintToleration".

2、Determine which preferred feature to utilize during scheduling based on the annotation value.

NodeAffinity：

checkPreferred = func(node *v1.Node, pod *v1.Pod) bool {
    affinity := pod.Spec.Affinity
    if affinity != nil && affinity.NodeAffinity != nil && affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution != nil {
        terms, err := corev1nodeaffinity.NewPreferredSchedulingTerms(affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution)
        if err != nil {
            klog.ErrorS(err, "failed to parse pod's nodeaffinity", "pod", klog.KObj(pod))
            return false
        }
        if terms != nil && terms.Score(node) > 0 {
            return true
        }
    }
    return false
}

TaintToleration:

checkPreferred = func(node *v1.Node, pod *v1.Pod) bool {
    var filterTolerations []v1.Toleration
    for _, toleration := range pod.Spec.Tolerations {
        if toleration.Effect != v1.TaintEffectPreferNoSchedule {
            continue
        }
        filterTolerations = append(filterTolerations, toleration)
    }
    if len(node.Spec.Taints) != 0 && len(filterTolerations) != 0 {
        for _, taint := range node.Spec.Taints {
            // check only on taints that have effect PreferNoSchedule
            if taint.Effect != v1.TaintEffectPreferNoSchedule {
                continue
            }
 
            if v1helper.TolerationsTolerateTaint(filterTolerations, &taint) {
                return true
            }
        }
    }
    return false
}

3、 Divide nodes into two groups, "passChecked" and "noPassChecked", based on whether they satisfy the preferred check.

4、To ensure equal scheduling probabilities for each node, randomly sort the "passChecked" and "noPassChecked" groups.

5、Reconstruct the nodes array by combining the "passChecked" and "noPassChecked" groups, ensuring that "passChecked" nodes come before "noPassChecked" nodes.

6、Call the "findNodesThatPassFilters" method to search for feasible nodes in the new nodes array.

7、If the length of "passChecked" is 0, adjust the value of "nextStartNodeIndex"; otherwise, leave it unchanged.

Why is this needed?

Currently, for performance reasons, the kube-scheduler follows this scheduling logic:
1、It starts filtering feasible nodes from the nextStartNodeIndex. It stops filtering after a specific number of nodes are filtered out that satisfy the Filter plugin (by default, this number is 100).

2、Then, it applies Score plugins to assign scores to these feasible nodes.

3、Finally, it selects the node with the highest score for scheduling.

However, because each scheduling attempt operates within a partial range and there are multiple Score plugins, this often results in pods not being scheduled onto the nodes users expect.

If we can add an new extension to check nodes, then we can prioritize scheduling pods onto the desired nodes.

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-05-13T08:53:41Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

AxeZhan · 2024-05-13T11:29:24Z

I assuming that your goal is to (try to)make sure a pod with preferred affinity and taint toleration to be scheduled to a node which matches node affinity and also has the tolerated taint?
Any specific user case for this behavior?

fanhaouu · 2024-05-13T11:51:58Z

I assuming that your goal is to (try to)make sure a pod with preferred affinity and taint toleration to be scheduled to a node which matches node affinity and also has the tolerated taint? Any specific user case for this behavior?

In the case of available pod resources, I want pods to be scheduled onto specific nodes as much as possible. However, the numerous score plugins enabled in the cluster, along with their predefined weights set by SREs, make it challenging for users to dynamically adjust them. Meanwhile, due to performance considerations, the scheduler only traverses and evaluates a subset of nodes. This often leads to suboptimal scheduling results.

AxeZhan · 2024-05-14T11:29:40Z

I get the point that this is trying to get ideal score result. But since the scheduler never guarantees that the pod will be scheduled to the node with the highest score, I'm still confused why this is needed(if you really want to match the node affinity, why not using requiredDuringScheduling).

Anyway, I think you can write a simple doc, and put it on the agenda of sig-scheduling(https://github.com/kubernetes/community/tree/master/sig-scheduling). Folks can have a discussion during the meeting then.

fanhaouu · 2024-05-17T08:45:30Z

I get the point that this is trying to get ideal score result. But since the scheduler never guarantees that the pod will be scheduled to the node with the highest score, I'm still confused why this is needed(if you really want to match the node affinity, why not using requiredDuringScheduling).

Anyway, I think you can write a simple doc, and put it on the agenda of sig-scheduling(https://github.com/kubernetes/community/tree/master/sig-scheduling). Folks can have a discussion during the meeting then.

Okay, thank you. I understand your confusion. My main goal is to ensure that pods are always scheduled to preferred nodes first, rather than partial preferred, while meeting resource requirements

likakuli · 2024-05-23T08:31:39Z

/cc

fanhaouu added the kind/feature Categorizes issue or PR as related to a new feature. label May 13, 2024

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure that pods are scheduled to nodes that meet preferred conditions, while satisfying a series of filter plugins for the scheduler. #124844

Ensure that pods are scheduled to nodes that meet preferred conditions, while satisfying a series of filter plugins for the scheduler. #124844

fanhaouu commented May 13, 2024 •

edited

k8s-ci-robot commented May 13, 2024

AxeZhan commented May 13, 2024

fanhaouu commented May 13, 2024

AxeZhan commented May 14, 2024

fanhaouu commented May 17, 2024

likakuli commented May 23, 2024

Ensure that pods are scheduled to nodes that meet preferred conditions, while satisfying a series of filter plugins for the scheduler. #124844

Ensure that pods are scheduled to nodes that meet preferred conditions, while satisfying a series of filter plugins for the scheduler. #124844

Comments

fanhaouu commented May 13, 2024 • edited

What would you like to be added?

Why is this needed?

k8s-ci-robot commented May 13, 2024

AxeZhan commented May 13, 2024

fanhaouu commented May 13, 2024

AxeZhan commented May 14, 2024

fanhaouu commented May 17, 2024

likakuli commented May 23, 2024

fanhaouu commented May 13, 2024 •

edited