Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add auto_calculation for system_reserved resources #10830

Conversation

Payback159
Copy link
Contributor

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test

/kind feature

/kind flake

What this PR does / why we need it:

Reserving resources helps to improve the stability of the cluster components. In my opinion, the resource consumption of the system also changes with the load of the node (Scheduled Pods), so there should also be a possibility to reserve the system resources relative to the node size.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Calculation was copied from here: https://cloud.google.com/anthos/clusters/docs/on-prem/latest/how-to/resources-available-pods I have added a new variable system_reserved_auto_calculate so that Kubespray users can decide whether they want to use the automatic calculation or assign absolute values.

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 22, 2024
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 22, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @Payback159. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Payback159
Once this PR has been reviewed and has the lgtm label, please assign cristicalin for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 22, 2024
@Payback159
Copy link
Contributor Author

I am currently considering whether we should take this as an opportunity to rename the system_master* variables to system_control_plane*. This would lead to a user-facing change but it should be done either way or what do you think?

@VannTen
Copy link
Contributor

VannTen commented Jan 23, 2024

I am currently considering whether we should take this as an opportunity to rename the system_master* variables to system_control_plane*. This would lead to a user-facing change but it should be done either way or what do you think?

the *_master and *_node variable needs to go anyway, they are redundant with group_vars. I'm overhauling that part (cgroups / reserved stuff) in #10714

@VannTen
Copy link
Contributor

VannTen commented Jan 23, 2024

Besides that, what system components do you think scale their resources usages with number of scheduled pods exactly ?
For kube_reserved (aka container manager + kubelet) I can see, but for system_reserved that's less obvious: sshd and similar don't depend on the number of scheduled pods.
I could see the kernel + maybe the network manager ?

@Payback159
Copy link
Contributor Author

@VannTen you are completely right. I confused it with kube-reserved as I was thinking about necessary file-handler and additional load on the kubelet agents, container runtime per pod/container.

If you agree with me I would like to adapt the PR to provide the logic for kube-reserved. For system-reserved it makes less sense to me, as these should correlate less strongly with the load.

@yankay
Copy link
Member

yankay commented Jan 24, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 24, 2024
@k8s-ci-robot
Copy link
Contributor

@Payback159: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubespray-yamllint 91a1645 link true /test pull-kubespray-yamllint

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@VannTen
Copy link
Contributor

VannTen commented Jan 25, 2024

... I would like to adapt the PR to provide the logic for kube-reserved.

Fine by me.

Another remark: it would a good idea to document (and links to in the best case) the calculation method used. (that would make it easier to see if we could change it in the future in particular).

@Payback159
Copy link
Contributor Author

Hello @VannTen,

with a little delay I have now placed a new PR for kubeReserved values. I am closing this PR as it will become obsolete.

#11082

I have added the documentation about the calculation to the cgroup documentation as it makes the most sense to me.

@Payback159 Payback159 closed this Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants