Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a cookbook on how to send batch requests with vLLM #811

Open
rlouf opened this issue Apr 12, 2024 · 1 comment
Open

Add a cookbook on how to send batch requests with vLLM #811

rlouf opened this issue Apr 12, 2024 · 1 comment
Labels
documentation Linked to documentation and examples help wanted vLLM Things involving vLLM support

Comments

@rlouf
Copy link
Member

rlouf commented Apr 12, 2024

Some users may need to send batch requests with several prompt/schema pairs. It is possible to do this with the vLLM server integration using aiohttp, and we should document this.

@rlouf rlouf added documentation Linked to documentation and examples help wanted vLLM Things involving vLLM support labels Apr 12, 2024
@rbgo404
Copy link

rbgo404 commented Apr 15, 2024

Two ways of serving models using vLLM:

  1. Online Mode: Which usage the OpenAI Client and expose the API's for text generation. Also you can create async request for token streaming. And IMO it does the dynamic batching for which you cannot explicitly mention the batch size.
  2. Offline mode: This usage the llm.generate function where you can pass the batch of prompts.

For your usecase you can use vLLM on offline serving without using aiohttp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Linked to documentation and examples help wanted vLLM Things involving vLLM support
Projects
None yet
Development

No branches or pull requests

2 participants