Skip to content

Can vllm serving clients by using multiple model instances? #239

Answered by zhuohan123
aoyulong asked this question in Q&A
Discussion options

You must be logged in to vote

Right now vLLM is a serving engine for a single model. You can start multiple vLLM server replicas and use a custom load balancer (e.g., nginx load balancer). Also feel free to checkout FastChat and other multi-model frontends (e.g., aviary). vLLM can be a model worker of these libraries to support multi-replica serving.

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by zhuohan123
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants
Converted from issue

This discussion was converted from issue #181 on June 25, 2023 16:43.