Parallelize the eviction of pods with volumes #848

timuthy · 2023-09-08T08:51:01Z

How to categorize this issue?

/area performance
/kind enhancement
/priority 3

What would you like to be added:
MCM should provide a knob to configure the degree of parallel evictions for pods with volumes.

Why is this needed:
#262 established a serial eviction of pods with volumes to make the overall node drain process faster, esp. for cloud providers where many parallel detach/attach operations lead to rate limits and huge back-offs.

On some infrastructures and to some degree, a parallel eviction for pods with volumes may lead to a beneficial performance boost. Today, shoot clusters with many nodes often need a considerable amount of time to perform rolling updates. We see this aspect being one of the root causes that can be improved.

timuthy · 2023-11-02T09:35:41Z

Any opinion @gardener/mcm-maintainers?

elankath · 2023-11-02T14:37:55Z

Hi Tim, we can support this though I am doubtful whether we should make it configurable in the shoot YAML as it can possibly lead to severe degradation if operator configures a high value and then a fair amount of effort diagnosing/trouble-shooting such issues.

However, I think we can introduce a fixed degree of parallelism in evicting Pods with PVs after relevant testing of the behaviour on problematic providers like Azure. Now that we have implemented #781, today we wait for all volumes to be detached from the Node before proceeding to VM deletion. Hence those edge cases where still attached volumes cause the attach/detach controller to go into timeouts, is ameliorated.

timuthy · 2023-11-02T14:49:03Z

Thanks for the feedback @elankath. It wasn't meant to be an option for shoot owners. The degree of parallelism can also be configured by Gardenlet via its config.

elankath · 2023-11-02T14:53:27Z

That fine. A "hidden knob" like a CLI option parallel-eviction-limit=X should be OK.

timuthy added the kind/enhancement Enhancement, improvement, extension label Sep 8, 2023

gardener-robot added area/performance Performance (across all domains, such as control plane, networking, storage, etc.) related priority/3 Priority (lower number equals higher priority) labels Sep 8, 2023

himanshu-kun assigned elankath Nov 2, 2023

rishabh-11 unassigned elankath Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize the eviction of pods with volumes #848

Parallelize the eviction of pods with volumes #848

timuthy commented Sep 8, 2023

timuthy commented Nov 2, 2023

elankath commented Nov 2, 2023

timuthy commented Nov 2, 2023

elankath commented Nov 2, 2023 •

edited

Parallelize the eviction of pods with volumes #848

Parallelize the eviction of pods with volumes #848

Comments

timuthy commented Sep 8, 2023

timuthy commented Nov 2, 2023

elankath commented Nov 2, 2023

timuthy commented Nov 2, 2023

elankath commented Nov 2, 2023 • edited

elankath commented Nov 2, 2023 •

edited