Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openai[patch]: ChatOpenAI.batch function #5016

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

davidfant
Copy link
Contributor

@davidfant davidfant commented Apr 8, 2024

This PR groups together API calls for prompts that are the same so that:

  1. Less tokens are used
  2. LangSmith shows the batch nicely as one run rather than n runs

Copy link

vercel bot commented Apr 8, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchainjs-api-refs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 11, 2024 4:55pm
langchainjs-docs ✅ Ready (Inspect) Visit Preview Apr 11, 2024 4:55pm

@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. auto:improvement Medium size change to existing code to handle new use-cases labels Apr 8, 2024
@jacoblee93
Copy link
Collaborator

Thanks for looking into this!

I think we would need to add this to the OpenAICallOptions type as well - I think I'm broadly ok with that, but want to CC @baskaryan for standardization.

Also - is it necessary? You are calling .generate() with this I assume? Would using .batch() fit your use-case instead?

@jacoblee93 jacoblee93 changed the title ChatOpenAI: allow providing n as option at invokation time openai[patch]: ChatOpenAI: allow providing n as option at invokation time Apr 9, 2024
@jacoblee93 jacoblee93 added question Further information is requested close PRs that need one or two touch-ups to be ready labels Apr 9, 2024
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Apr 10, 2024
@davidfant davidfant changed the title openai[patch]: ChatOpenAI: allow providing n as option at invokation time openai[patch]: ChatOpenAI.batch function Apr 10, 2024
@davidfant
Copy link
Contributor Author

@jacoblee93 I updated the PR to add ChatOpenAI.batch instead. Lmk what you think

if (promptValueStrings.every((p) => p === promptValueStrings[0])) {
const result = await this.generatePrompt(
[promptValues[0]],
{ ...options, n: inputs.length } as CallOptions,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably upper bound this - I can handle it!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have the same output as just sending n requests? Or will it pick the top n candidates?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey so chatted with the Python folks - this would change the tracing behavior for folks, and they have some concerns about overall behavior changing since it's a black box on OpenAI's end.

Could we table it for now? Sorry for the thrash - you can always wrap a .generate() call in a lambda.

Copy link
Contributor Author

@davidfant davidfant Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have the same output as just sending n requests? Or will it pick the top n candidates?

Yes, this makes OpenAI create n independent results for the same prompt. best_of would return top candidates based on log probs
https://platform.openai.com/docs/api-reference/chat/create

Hey so chatted with the Python folks - this would change the tracing behavior for folks, and they have some concerns about overall behavior changing since it's a black box on OpenAI's end.

Could we table it for now? Sorry for the thrash - you can always wrap a .generate() call in a lambda.

Ok. FWIW here are my 2c:

  • I don't really get the point with "concerns about overall behavior". The samples are generated independently, with the benefit of only paying for input tokens once.
  • Pricing-wise the difference is huge, esp for use cases with lots of input and limited output. For us, we have lots of input tokens and not so many output tokens (relatively speaking), so not using n would be not great
  • IMO the tracing behavior is changed for the better, at least in terms of how this is visualized in LS
  • The goal with adding this to ChatOpenAI.batch (rather than hackily accomplishing the same thing with generate) is to avoid having lots of different logic for how to do requests depending on what model provider is used. Basically I've abstracted out model in my runnables so that they're given model: BaseChatModel, which lets me easily configure what model to use from one place.

If this still isn't a change that doesn't make sense on your end, I'll just apply a patch locally for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenAI supporting n completions is a very high value feature, because of the fact that input tokens are priced only once. If you make n separate requests you eat the input token costs n times. This is an amazing aspect of the OpenAI pricing model, which many other providers don't support (for example Anthropic). I believe making it easy for users to benefit from this, even if they don't know about it is a great value add LangChain can provide.

OpenAI supports the best_of option, which has interplay with n.

Generates best_of completions server-side and returns the "best" (the one with the highest log probability per token). Results cannot be streamed.

Users can also do this themselves now that chat completions return logprobs. It's a common pattern in my workflows to increase temperature for higher generation variance and utilizing the logprobs or simply doing self-consistency voting (https://arxiv.org/abs/2203.11171). The OpenAI pricing model has great synergy with these techniques, since you only pay extra for your generations.

I would almost argue that this feature of the API enables quality improving techniques where they would otherwise be cost prohibitive, and think leaning in and making these as easy to use as possible is of immense value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll figure it out on our end and get this merged. Thanks for weighing in!

@jacoblee93 jacoblee93 added the hold On hold label Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:improvement Medium size change to existing code to handle new use-cases close PRs that need one or two touch-ups to be ready hold On hold question Further information is requested size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants