Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should I enable vllm distributed inference on multiple single GPU nodes? #1452

Open
Jie2GG opened this issue May 8, 2024 · 0 comments
Open
Labels
gpu question Further information is requested
Milestone

Comments

@Jie2GG
Copy link

Jie2GG commented May 8, 2024

I have multiple xinference worker nodes deployed in a server cluster, and when I run Qwen1.5-14B-Chat, I want to enable distributed inference on multiple nodes to better leverage multiple Gpus in multiple clusters (like the ray cluster), But xinference doesn't seem to allow me to do that.

image

How should I configure xinference so that the vllm framework can leverage Gpus on multiple nodes for distributed inference?

@Jie2GG Jie2GG added the question Further information is requested label May 8, 2024
@XprobeBot XprobeBot added the gpu label May 8, 2024
@XprobeBot XprobeBot modified the milestones: v0.11.0, v0.11.1 May 8, 2024
@XprobeBot XprobeBot modified the milestones: v0.11.1, v0.11.2 May 17, 2024
@XprobeBot XprobeBot modified the milestones: v0.11.2, v0.11.3 May 24, 2024
@XprobeBot XprobeBot modified the milestones: v0.11.3, v0.11.4 May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpu question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants