Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ray Autoscaling] Issues related to the handling of Pending Worker Nodes when scaling down #45195

Open
yx367563 opened this issue May 8, 2024 · 2 comments
Labels
core Issues that should be addressed in Ray Core enhancement Request for new feature and/or capability @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. P1 Issue that should be fixed within a few weeks

Comments

@yx367563
Copy link

yx367563 commented May 8, 2024

Description

When using KubeRay to deploy k8s cluster, if the k8s cluster resources are tight during scale up, some Worker Nodes will always be in the Pending state.
After the job execution is completed, the running Worker Nodes will be scaled down according to the configured idleTimeoutSeconds.
Then, as resources in the cluster become available, the Worker Nodes that were previously in the Pending state will be converted to the Running state and wait for idleTimeoutSeconds again to scale down.
If there are too many Worker Nodes in the Pending state, it will take a long time to scale down all the unneeded nodes and release the occupied resources when the task execution is completed, which will result in lower resource utilization.

Use case

Users may use a large maxWorkerNum and submit a large number of Ray Tasks at once when auto-scaling is enabled. According to the current auto-scaling rules, it will try to allocate Worker Nodes that can satisfy the resources required by all Tasks, and a large number of Worker Nodes will be in the Pending state when k8s resources are tight, i.e. Available WorkerNodes is a small number, while Desired WorkerNodes is a large number.
When the task is completed, the Available WorkerNodes will be scaled down according to the configured idleTimeoutSeconds, and the released resources will be occupied by the Pending Worker Nodes, which will then wait for idleTimeoutSeconds again, resulting in a long time to release the resources.

Possible solution: Remove all Worker Nodes of a certain type that is in Pending state before scaling down a Worker Node of that type due to idle, otherwise Worker Nodes of that type should not be in idle state either.

@yx367563 yx367563 added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels May 8, 2024
@anyscalesam anyscalesam added the core Issues that should be addressed in Ray Core label May 13, 2024
@jjyao
Copy link
Contributor

jjyao commented May 13, 2024

@yx367563 are you using the RayJob CRD? If so, after the job finishes, all the nodes will be deleted including the pending nodes.

@jjyao jjyao added P1 Issue that should be fixed within a few weeks @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels May 13, 2024
@yx367563
Copy link
Author

yx367563 commented May 14, 2024

@yx367563 are you using the RayJob CRD? If so, after the job finishes, all the nodes will be deleted including the pending nodes.

@jjyao I need to use Ray Cluster and submit jobs containing high load and low load on it. I found that when I submit a high load job, if the resources of the k8s cluster are insufficient, there will be a large number of nodes in the Pending state. The ratio of Available Node and Desired Node is about 1:10. When the task ends later, the corresponding Available Nodes will be cleared in the idle state, and then the Desired Node will gradually be converted into Available Node, resulting in a very slow scale down. I think the Pending Node at this time should be cleared directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues that should be addressed in Ray Core enhancement Request for new feature and/or capability @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests

3 participants