You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it
Feature request
In the OpenAI API, the chat.completions.create function has a parameter called "n" which controls the number of generations in response to the given prompt. Because the output is non-deterministic there are many applications in which you'd like to generate and compare multiple responses to the same input. Here is a basic example of how someone might use this parameter using openai (no langchain):
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Here is my example prompt"
}
],
model=model,
n=5
)
Note this would return a list of 5 different responses to the prompt
Langchain doesn't natively support this. The workaround is to use the "batch" method for a ChatModel and copy the same prompt multiple times. But the openai models don't override the batch implementation in the default langchain runnable so this means separate calls are being made to the openai API which is not necessary. Here is an example of how one would do this in langchain:
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
llm = ChatOpenAI()
messages = [[HumanMessage("Here is my example prompt")]]*5
completion = llm.batch(messages)
Motivation
This is a common problem, which is why the OpenAI API supports it natively. It is used in research, in metric computation (for example via self-consistency), etc.
I profiled the examples I gave in the request description - doing the batch calls with langchain increases latency by 20% !!!
Proposal (If applicable)
Probably the best way to tackle this would be to add an "n" parameter to the invoke method for runnables. By default, the method would check if n > 1, if not, do the same thing it already does now. However, if it is > 1, then call a separate method. In the default implementation of a runnable, this separate method would use the batching workaround described above. However, specific models, like the openAI, models would override this method to use the "n" feature in openai API.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Checked
Feature request
In the OpenAI API, the chat.completions.create function has a parameter called "n" which controls the number of generations in response to the given prompt. Because the output is non-deterministic there are many applications in which you'd like to generate and compare multiple responses to the same input. Here is a basic example of how someone might use this parameter using openai (no langchain):
Note this would return a list of 5 different responses to the prompt
Langchain doesn't natively support this. The workaround is to use the "batch" method for a ChatModel and copy the same prompt multiple times. But the openai models don't override the batch implementation in the default langchain runnable so this means separate calls are being made to the openai API which is not necessary. Here is an example of how one would do this in langchain:
Motivation
This is a common problem, which is why the OpenAI API supports it natively. It is used in research, in metric computation (for example via self-consistency), etc.
I profiled the examples I gave in the request description - doing the batch calls with langchain increases latency by 20% !!!
Proposal (If applicable)
Probably the best way to tackle this would be to add an "n" parameter to the invoke method for runnables. By default, the method would check if n > 1, if not, do the same thing it already does now. However, if it is > 1, then call a separate method. In the default implementation of a runnable, this separate method would use the batching workaround described above. However, specific models, like the openAI, models would override this method to use the "n" feature in openai API.
Beta Was this translation helpful? Give feedback.
All reactions