Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running jobs in a Slurm cluster #16

Open
ThomasA opened this issue Mar 17, 2023 · 6 comments
Open

Running jobs in a Slurm cluster #16

ThomasA opened this issue Mar 17, 2023 · 6 comments

Comments

@ThomasA
Copy link

ThomasA commented Mar 17, 2023

The feature
It would be interesting if Runhouse could also interface to a cluster in the form of a an existing Slurm cluster.

Motivation
I am part of a team managing a Slurm (GPU) cluster. On the other hand, I have users who are interested in being able to run large language models via Runhouse (https://langchain.readthedocs.io/en/latest/modules/llms/integrations/self_hosted_examples.html). It would be excellent if I could bridge this gap between supply and demand with Runhouse. From what I have read in the documentation so far, Runhouse does not seem to come with an interface to Slurm so far.

What the ideal solution looks like
I am completely new to Runhouse, so this may not be the ideal solution model, but I imagine this could be supported as a bring-your-own cluster with a little bit of extra interaction between Runhouse and Slurm to request the necessary resources (maybe from the Cluster factory method) as a job / jobs in Slurm (probably through the Slurm REST API). Once the jobs are running, the nodes involved can be contacted by Runhouse as a BYO cluster.

@dongreenberg
Copy link
Contributor

This is very interesting and we've actually been digging into it for a few weeks. It seems doable pending a few questions about how the cluster is set up. Would you be willing to jump on a quick call just to answer a few questions and talk through our approach?

@ThomasA
Copy link
Author

ThomasA commented Mar 22, 2023

I would like to help out and can probably also help test it in our cluster. I can be available for a call at US-friendly times tomorrow and maybe Friday. Can you email me to coordinate?

@andre15silva
Copy link

I am also interested in getting runhouse interfacing with a Slurm cluster.

Has there been any progress recently on this issue?

@jlewitt1
Copy link
Collaborator

Hey Andre we're still in the POC stage - we'd be happy to speak with you to hear more about your requirements and how that integration can work for your setup. Just sent you an email to coordinate

@eugene-prout
Copy link

Hello,

I have a similar setup to those above and would like to try out Runhouse. Are there any updates to this issue? I have experience interfacing with Slurm clusters so would be happy to contribute if that would help get this past POC.

@jlewitt1
Copy link
Collaborator

Hi Eugene thanks for reaching out! We'd love to support slurm, it's on our roadmap along with other compute providers (e.g. k8s), and we hope to get the slurm support into the next or following release. In the meantime we'd be happy to hear your thoughts and possible contribution on this! Sent you an email to discuss further

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

5 participants