New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openai[patch]: ChatOpenAI.batch
function
#5016
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Thanks for looking into this! I think we would need to add this to the Also - is it necessary? You are calling |
n
as option at invokation timen
as option at invokation time
n
as option at invokation timeChatOpenAI.batch
function
@jacoblee93 I updated the PR to add |
if (promptValueStrings.every((p) => p === promptValueStrings[0])) { | ||
const result = await this.generatePrompt( | ||
[promptValues[0]], | ||
{ ...options, n: inputs.length } as CallOptions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably upper bound this - I can handle it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have the same output as just sending n
requests? Or will it pick the top n
candidates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey so chatted with the Python folks - this would change the tracing behavior for folks, and they have some concerns about overall behavior changing since it's a black box on OpenAI's end.
Could we table it for now? Sorry for the thrash - you can always wrap a .generate()
call in a lambda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have the same output as just sending
n
requests? Or will it pick the topn
candidates?
Yes, this makes OpenAI create n
independent results for the same prompt. best_of
would return top candidates based on log probs
https://platform.openai.com/docs/api-reference/chat/create
Hey so chatted with the Python folks - this would change the tracing behavior for folks, and they have some concerns about overall behavior changing since it's a black box on OpenAI's end.
Could we table it for now? Sorry for the thrash - you can always wrap a
.generate()
call in a lambda.
Ok. FWIW here are my 2c:
- I don't really get the point with "concerns about overall behavior". The samples are generated independently, with the benefit of only paying for input tokens once.
- Pricing-wise the difference is huge, esp for use cases with lots of input and limited output. For us, we have lots of input tokens and not so many output tokens (relatively speaking), so not using
n
would be not great - IMO the tracing behavior is changed for the better, at least in terms of how this is visualized in LS
- The goal with adding this to ChatOpenAI.batch (rather than hackily accomplishing the same thing with generate) is to avoid having lots of different logic for how to do requests depending on what model provider is used. Basically I've abstracted out model in my runnables so that they're given
model: BaseChatModel
, which lets me easily configure what model to use from one place.
If this still isn't a change that doesn't make sense on your end, I'll just apply a patch locally for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenAI supporting n
completions is a very high value feature, because of the fact that input tokens are priced only once. If you make n
separate requests you eat the input token costs n
times. This is an amazing aspect of the OpenAI pricing model, which many other providers don't support (for example Anthropic). I believe making it easy for users to benefit from this, even if they don't know about it is a great value add LangChain can provide.
OpenAI supports the best_of
option, which has interplay with n
.
Generates best_of completions server-side and returns the "best" (the one with the highest log probability per token). Results cannot be streamed.
Users can also do this themselves now that chat completions return logprobs
. It's a common pattern in my workflows to increase temperature for higher generation variance and utilizing the logprobs
or simply doing self-consistency voting (https://arxiv.org/abs/2203.11171). The OpenAI pricing model has great synergy with these techniques, since you only pay extra for your generations.
I would almost argue that this feature of the API enables quality improving techniques where they would otherwise be cost prohibitive, and think leaning in and making these as easy to use as possible is of immense value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll figure it out on our end and get this merged. Thanks for weighing in!
This PR groups together API calls for prompts that are the same so that:
n
runs