openai[patch]: `ChatOpenAI.batch` function #5016

davidfant · 2024-04-08T21:25:55Z

This PR groups together API calls for prompts that are the same so that:

Less tokens are used
LangSmith shows the batch nicely as one run rather than n runs

vercel · 2024-04-08T21:25:59Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
langchainjs-api-refs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Apr 11, 2024 4:55pm
langchainjs-docs	✅ Ready (Inspect)	Visit Preview		Apr 11, 2024 4:55pm

jacoblee93 · 2024-04-09T00:38:49Z

Thanks for looking into this!

I think we would need to add this to the OpenAICallOptions type as well - I think I'm broadly ok with that, but want to CC @baskaryan for standardization.

Also - is it necessary? You are calling .generate() with this I assume? Would using .batch() fit your use-case instead?

davidfant · 2024-04-10T14:38:13Z

@jacoblee93 I updated the PR to add ChatOpenAI.batch instead. Lmk what you think

jacoblee93 · 2024-04-11T18:23:48Z

libs/langchain-openai/src/chat_models.ts

+ if (promptValueStrings.every((p) => p === promptValueStrings[0])) {
+ const result = await this.generatePrompt(
+ [promptValues[0]],
+ { ...options, n: inputs.length } as CallOptions,


We should probably upper bound this - I can handle it!

Does this have the same output as just sending n requests? Or will it pick the top n candidates?

Hey so chatted with the Python folks - this would change the tracing behavior for folks, and they have some concerns about overall behavior changing since it's a black box on OpenAI's end.

Could we table it for now? Sorry for the thrash - you can always wrap a .generate() call in a lambda.

Does this have the same output as just sending n requests? Or will it pick the top n candidates?

Yes, this makes OpenAI create n independent results for the same prompt. best_of would return top candidates based on log probs
https://platform.openai.com/docs/api-reference/chat/create

Hey so chatted with the Python folks - this would change the tracing behavior for folks, and they have some concerns about overall behavior changing since it's a black box on OpenAI's end.

Could we table it for now? Sorry for the thrash - you can always wrap a .generate() call in a lambda.

Ok. FWIW here are my 2c:

I don't really get the point with "concerns about overall behavior". The samples are generated independently, with the benefit of only paying for input tokens once.

Pricing-wise the difference is huge, esp for use cases with lots of input and limited output. For us, we have lots of input tokens and not so many output tokens (relatively speaking), so not using n would be not great

IMO the tracing behavior is changed for the better, at least in terms of how this is visualized in LS

The goal with adding this to ChatOpenAI.batch (rather than hackily accomplishing the same thing with generate) is to avoid having lots of different logic for how to do requests depending on what model provider is used. Basically I've abstracted out model in my runnables so that they're given model: BaseChatModel, which lets me easily configure what model to use from one place.

If this still isn't a change that doesn't make sense on your end, I'll just apply a patch locally for now.

OpenAI supporting n completions is a very high value feature, because of the fact that input tokens are priced only once. If you make n separate requests you eat the input token costs n times. This is an amazing aspect of the OpenAI pricing model, which many other providers don't support (for example Anthropic). I believe making it easy for users to benefit from this, even if they don't know about it is a great value add LangChain can provide.

OpenAI supports the best_of option, which has interplay with n.

Generates best_of completions server-side and returns the "best" (the one with the highest log probability per token). Results cannot be streamed.

Users can also do this themselves now that chat completions return logprobs. It's a common pattern in my workflows to increase temperature for higher generation variance and utilizing the logprobs or simply doing self-consistency voting (https://arxiv.org/abs/2203.11171). The OpenAI pricing model has great synergy with these techniques, since you only pay extra for your generations.

I would almost argue that this feature of the API enables quality improving techniques where they would otherwise be cost prohibitive, and think leaning in and making these as easy to use as possible is of immense value.

I'll figure it out on our end and get this merged. Thanks for weighing in!

openai: allow providing 'n' as option at invokation time

c783ea7

dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. auto:improvement Medium size change to existing code to handle new use-cases labels Apr 8, 2024

vercel bot had a problem deploying to Preview – langchainjs-docs April 8, 2024 21:30 Failure

vercel bot had a problem deploying to Preview – langchainjs-api-refs April 8, 2024 21:31 Failure

jacoblee93 changed the title ~~ChatOpenAI: allow providing n as option at invokation time~~ openai[patch]: ChatOpenAI: allow providing n as option at invokation time Apr 9, 2024

jacoblee93 added question Further information is requested close PRs that need one or two touch-ups to be ready labels Apr 9, 2024

added ChatOpenAI.batch

8ddd23c

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Apr 10, 2024

davidfant changed the title ~~openai[patch]: ChatOpenAI: allow providing n as option at invokation time~~ openai[patch]: ChatOpenAI.batch function Apr 10, 2024

pr cleanup

ad62791

vercel bot had a problem deploying to Preview – langchainjs-docs April 10, 2024 14:45 Failure

vercel bot had a problem deploying to Preview – langchainjs-api-refs April 10, 2024 14:47 Failure

only run prompt once

b2b4573

vercel bot had a problem deploying to Preview – langchainjs-docs April 10, 2024 15:10 Failure

vercel bot had a problem deploying to Preview – langchainjs-api-refs April 10, 2024 15:11 Failure

Merge branch 'main' into feat/openai-n

7907dc4

vercel bot deployed to Preview – langchainjs-api-refs April 11, 2024 16:55 View deployment

vercel bot deployed to Preview – langchainjs-docs April 11, 2024 16:55 View deployment

jacoblee93 reviewed Apr 11, 2024

View reviewed changes

jacoblee93 added the hold On hold label Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openai[patch]: `ChatOpenAI.batch` function #5016

openai[patch]: `ChatOpenAI.batch` function #5016

davidfant commented Apr 8, 2024 •

edited

vercel bot commented Apr 8, 2024 •

edited

jacoblee93 commented Apr 9, 2024

davidfant commented Apr 10, 2024

jacoblee93 Apr 11, 2024

jacoblee93 Apr 11, 2024

jacoblee93 Apr 11, 2024 •

edited

davidfant Apr 12, 2024 •

edited

functorism Apr 12, 2024

jacoblee93 Apr 12, 2024

openai[patch]: ChatOpenAI.batch function #5016

Are you sure you want to change the base?

openai[patch]: ChatOpenAI.batch function #5016

Conversation

davidfant commented Apr 8, 2024 • edited

vercel bot commented Apr 8, 2024 • edited

jacoblee93 commented Apr 9, 2024

davidfant commented Apr 10, 2024

jacoblee93 Apr 11, 2024

Choose a reason for hiding this comment

jacoblee93 Apr 11, 2024

Choose a reason for hiding this comment

jacoblee93 Apr 11, 2024 • edited

Choose a reason for hiding this comment

davidfant Apr 12, 2024 • edited

Choose a reason for hiding this comment

functorism Apr 12, 2024

Choose a reason for hiding this comment

jacoblee93 Apr 12, 2024

Choose a reason for hiding this comment

openai[patch]: `ChatOpenAI.batch` function #5016

openai[patch]: `ChatOpenAI.batch` function #5016

davidfant commented Apr 8, 2024 •

edited

vercel bot commented Apr 8, 2024 •

edited

jacoblee93 Apr 11, 2024 •

edited

davidfant Apr 12, 2024 •

edited