-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kueue multipod #3543
base: master
Are you sure you want to change the base?
Kueue multipod #3543
Conversation
…ave their workload resources cleaned up
Hi folks, Kueue maintainer here. I would love to learn more about skypilot and how we can best align our projects in the future. If anyone interested in the topic has some time, I would welcome you to present in a Kubernetes WG Batch meeting to learn about this system and the current challenges you face with Kueue, if any. You can find details about the meeting here https://github.com/kubernetes/community/tree/master/wg-batch |
Btw, we also maintain the K8s job-controller and jobset, which are the recommended way of creating groups of pods. It's a better pattern than plain pods. |
Hey @alculquicondor - thanks for your interest! I've added SkyPilot to the future agenda for wg-batch meetings - let me know if 6th June works.
We've looked into jobset/job-controller, but concluded plain pods are the best way to implement the SkyPilot cluster abstraction since lifecycle management is done by the SkyPilot control plane:
That said, we're open to other ideas and suggestions! |
…into kueue-multipod
This integrates kueue for multipod/node workloads. Previously triggering kueue to manage skypilot pods by including
kueue.x-k8s.io/queue-name: user-queue
under resource labels was causing kueue to recognize each pod as a distinct workload. This adds the additional required kueue labels/annotations anytimekueue.x-k8s.io/queue-name
is included as part of a k8s skypilot Task so a multinode job registers as a single workload under kueue as well as performing sanity checks to ensure the user specified queues and priorities exist in the k8s cluster.Tested (run the relevant ones):
bash format.sh
sky-kueue-multinode.yaml