Add a cookbook on how to send batch requests with vLLM #811

rlouf · 2024-04-12T11:25:48Z

Some users may need to send batch requests with several prompt/schema pairs. It is possible to do this with the vLLM server integration using aiohttp, and we should document this.

The text was updated successfully, but these errors were encountered:

rbgo404 · 2024-04-15T12:14:06Z

Two ways of serving models using vLLM:

Online Mode: Which usage the OpenAI Client and expose the API's for text generation. Also you can create async request for token streaming. And IMO it does the dynamic batching for which you cannot explicitly mention the batch size.
Offline mode: This usage the llm.generate function where you can pass the batch of prompts.

For your usecase you can use vLLM on offline serving without using aiohttp

rlouf added documentation Linked to documentation and examples help wanted vLLM Things involving vLLM support labels Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a cookbook on how to send batch requests with vLLM #811

Add a cookbook on how to send batch requests with vLLM #811

rlouf commented Apr 12, 2024

rbgo404 commented Apr 15, 2024

Add a cookbook on how to send batch requests with vLLM #811

Add a cookbook on how to send batch requests with vLLM #811

Comments

rlouf commented Apr 12, 2024

rbgo404 commented Apr 15, 2024