You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have multiple xinference worker nodes deployed in a server cluster, and when I run Qwen1.5-14B-Chat, I want to enable distributed inference on multiple nodes to better leverage multiple Gpus in multiple clusters (like the ray cluster), But xinference doesn't seem to allow me to do that.
How should I configure xinference so that the vllm framework can leverage Gpus on multiple nodes for distributed inference?
The text was updated successfully, but these errors were encountered:
I have multiple xinference worker nodes deployed in a server cluster, and when I run Qwen1.5-14B-Chat, I want to enable distributed inference on multiple nodes to better leverage multiple Gpus in multiple clusters (like the ray cluster), But xinference doesn't seem to allow me to do that.
How should I configure xinference so that the vllm framework can leverage Gpus on multiple nodes for distributed inference?
The text was updated successfully, but these errors were encountered: