How should I enable vllm distributed inference on multiple single GPU nodes? #1452

Jie2GG · 2024-05-08T09:03:14Z

I have multiple xinference worker nodes deployed in a server cluster, and when I run Qwen1.5-14B-Chat, I want to enable distributed inference on multiple nodes to better leverage multiple Gpus in multiple clusters (like the ray cluster), But xinference doesn't seem to allow me to do that.

How should I configure xinference so that the vllm framework can leverage Gpus on multiple nodes for distributed inference?

Jie2GG added the question Further information is requested label May 8, 2024

XprobeBot added the gpu label May 8, 2024

XprobeBot modified the milestones: v0.11.0, v0.11.1 May 8, 2024

XprobeBot modified the milestones: v0.11.1, v0.11.2 May 17, 2024

XprobeBot modified the milestones: v0.11.2, v0.11.3 May 24, 2024

XprobeBot modified the milestones: v0.11.3, v0.11.4 May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How should I enable vllm distributed inference on multiple single GPU nodes? #1452

How should I enable vllm distributed inference on multiple single GPU nodes? #1452

Jie2GG commented May 8, 2024

How should I enable vllm distributed inference on multiple single GPU nodes? #1452

How should I enable vllm distributed inference on multiple single GPU nodes? #1452

Comments

Jie2GG commented May 8, 2024