-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
app Container can't reuse its init Container cpuset in a specific condition #124797
Comments
/sig node |
The piont is: the cpuset allocations for init-containers and app-containers differ due to changes in available cpusets in the time interval between the start of the init-container and the app-container. |
related: #94220 |
#124282 |
@ffromani Thanks for the mention, the issue I mention in this issue is a little different from #94220 . |
@chengjoey Not exactly the same issue. In #124282 , the init-container and app-container request different amount of cpu(init > app), and the cpuset from init-container can't be released even though it has exited (similar to #94220). But in this case, init-container and app-container request same amount of cpu(init == app), so this is a kubelet issue instead of kube-scheduler. |
Yes, I realized after re-reading the description of this issue. I'd need to check if the system does guarantee the maximum reuse of init container cpu cores when allocating the app container cpu cores. Nevertheless, it's a very desirable property the system should strive to ensure. My gut feeling is there is just a bug in this area, I remember various conversations over time. |
the core issue is here: https://github.com/kubernetes/kubernetes/blob/v1.30.0/pkg/kubelet/cm/cpumanager/policy_static.go#L394 with this line of code, all the available CPUs are put in a single pool. IOW, nothing guarantees that the reusable CPUs from terminated init container will be consumed first, or at all if the system has enough CPUs to fulfill the app container requirement. I vaguely remember some past conversations in this area about guaranteeing optimal allocation in the context of the topology manager-enforced constraints. Also, I wonder if and how we should extend this guarantee. IOW, should the reuse be best-effort (and so, arguably, there's no bug?) Perhaps the best way to fix would be to add a new cpu manager policy option. |
/triage accepted |
In this case, should we release init-container cpuset as soon as it exits? If a init-container exits successfully and won't restart anymore, it's cpuset either be reused by its own pod's app-container, or other pods' containers. Or else this "available" cpuset will not be used at all.
If we release init-container cpuset as soon as it exits, the reuse will be guarantee. |
What happened?
We can't make sure app Container always reuses init Container cpuset, which may lead to the waste of CPU(init container has already exited but the cpuset can not be reused by other containers) and
not enough cpus available to satis fy request
error.What did you expect to happen?
app Container always reuses init Container cpuset after init Container exits.
How can we reproduce it (as minimally and precisely as possible)?
This is one of the spefic condition that might cause the issue:
init container
andapp container
both request 92 cpu.What we expect is that Pod A's app container reuses its init container's cpuset.
But due to Pod B's deletion, it won't allocate the same cpuset as its init container.(4-49,100-145):
Now we have Pod A's init container taking cpuset: 4-24,48-60,73-84,100-120,144-156,169-180
And Pod A's app container taking cpuset: 4-49,100-145
The init container cpuset won't be reused as expected.
A new Pod C starts to allocate cpuset, it may get
not enough cpus available to satis fy request
errorAnything else we need to know?
No response
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: