Support "n" generations for same prompt in OpenAI API, instead of requiring batching #21830

mikeFore4 · 2024-05-17T20:11:49Z

mikeFore4
May 17, 2024

Checked

I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it

Feature request

In the OpenAI API, the chat.completions.create function has a parameter called "n" which controls the number of generations in response to the given prompt. Because the output is non-deterministic there are many applications in which you'd like to generate and compare multiple responses to the same input. Here is a basic example of how someone might use this parameter using openai (no langchain):

from openai import OpenAI

client = OpenAI()
completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Here is my example prompt"
                }
            ],
        model=model,
        n=5
        )

Note this would return a list of 5 different responses to the prompt

Langchain doesn't natively support this. The workaround is to use the "batch" method for a ChatModel and copy the same prompt multiple times. But the openai models don't override the batch implementation in the default langchain runnable so this means separate calls are being made to the openai API which is not necessary. Here is an example of how one would do this in langchain:

from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI

llm = ChatOpenAI()
messages = [[HumanMessage("Here is my example prompt")]]*5
completion = llm.batch(messages)

Motivation

This is a common problem, which is why the OpenAI API supports it natively. It is used in research, in metric computation (for example via self-consistency), etc.

I profiled the examples I gave in the request description - doing the batch calls with langchain increases latency by 20% !!!

Proposal (If applicable)

Probably the best way to tackle this would be to add an "n" parameter to the invoke method for runnables. By default, the method would check if n > 1, if not, do the same thing it already does now. However, if it is > 1, then call a separate method. In the default implementation of a runnable, this separate method would use the batching workaround described above. However, specific models, like the openAI, models would override this method to use the "n" feature in openai API.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support "n" generations for same prompt in OpenAI API, instead of requiring batching #21830

{{title}}

Replies: 0 comments

Select a reply

Support "n" generations for same prompt in OpenAI API, instead of requiring batching #21830

mikeFore4 May 17, 2024

Checked

Feature request

Motivation

Proposal (If applicable)

Replies: 0 comments

mikeFore4
May 17, 2024