You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some users may need to send batch requests with several prompt/schema pairs. It is possible to do this with the vLLM server integration using aiohttp, and we should document this.
The text was updated successfully, but these errors were encountered:
Online Mode: Which usage the OpenAI Client and expose the API's for text generation. Also you can create async request for token streaming. And IMO it does the dynamic batching for which you cannot explicitly mention the batch size.
Offline mode: This usage the llm.generate function where you can pass the batch of prompts.
For your usecase you can use vLLM on offline serving without using aiohttp
Some users may need to send batch requests with several prompt/schema pairs. It is possible to do this with the vLLM server integration using
aiohttp
, and we should document this.The text was updated successfully, but these errors were encountered: