Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAI-compatible API #20

Open
apcameron opened this issue Jul 18, 2023 · 22 comments
Open

OpenAI-compatible API #20

apcameron opened this issue Jul 18, 2023 · 22 comments

Comments

@apcameron
Copy link

Is it possible to provide an API the mimics the functionality of the OPENAI API?

@Vectorrent
Copy link

With LLaMA 2 released, it even expects the whole "system, user, assistant" format now...

https://github.com/facebookresearch/llama/blob/6c7fe276574e78057f917549435a2554000a876d/llama/generation.py#L213

@borzunov
Copy link
Member

Hi @LuciferianInk,

The format is not obligatory but does improve the quality of the model. We'll try moving to the official format to achieve that.

@Eclipse-Station
Copy link

Support that, quite a bit of backends use OpenAI library, but non OpenAI backends (like Oobabooga's text gen, which has OpenAI API extension). This would help transition from using OAI/3rd party compatible backends a breeze.

@borzunov
Copy link
Member

borzunov commented Aug 7, 2023

@apcameron @Eclipse-Station I agree that this feature would be useful. We'll try to find time to implement it - and pull requests are always welcome!

@krrishdholakia
Copy link

Hey @borzunov @Eclipse-Station i'm confused - why do you need to mimic the openai I/O format in a local model?

I don't think i saw this repo using openai at all? So what's the advantage

@borzunov
Copy link
Member

borzunov commented Aug 15, 2023

Hi @krrishdholakia,

This repo doesn't use OpenAI API in any sense, but using a similar interface would help with interoperability with existing software.

E.g., one could take an existing chatbot/text generation UI supporting OpenAI API, then replace the API URL to make it work via the Petals swarm without any code changes.

@krrishdholakia
Copy link

krrishdholakia commented Aug 15, 2023

Oh! I think we might be able to help - https://github.com/BerriAI/litellm.

Just created an issue to track this. Hope to get it done today.

def translate_function(model, messages, max_tokens,...):
	prompt = " ".join(message["content"] for message in messages)
	max_new_tokens = max_tokens
	return {"model": model, "prompt": prompt, "max_new_tokens": max_new_tokens} 

openai.api_base = litellm.translate_api_call(custom_api_base, translate_function)

@ishaan-jaff
Copy link

@apcameron @borzunov We added the ability to call petals.dev using liteLLM OpenAI chatGPT Input/Output, check out this example notebook:
https://github.com/BerriAI/litellm/blob/main/cookbook/liteLLM_Petals.ipynb

@borzunov
Copy link
Member

borzunov commented Aug 16, 2023

@krrishdholakia @ishaan-jaff Thanks for making the integration!

I think @apcameron and @Eclipse-Station want an HTTP API in the OpenAI-compatible format (= one URL) that internally translates API calls to the Petals HTTP/WebSocket API or directly to the Petals swarm. Can litellm help with that?

In any case, we appreciate your work on making Petals available for litellm users!

@apcameron
Copy link
Author

Yes we do not want to have to change the code of the applications we are using. The OpenAI API we are looking for needs to be transparent to the caller.

@krrishdholakia
Copy link

@apcameron are you just using OpenAI + Petals?

If you proxy openai, then you also need to deal with all the other openai requests (E.g. embeddings). But it seems like you just want to map the completion endpoint - correct?

In that case wouldn't you want to basically remap openai.ChatCompletion.create?

@apcameron
Copy link
Author

apcameron commented Aug 16, 2023

Yes I think the competition endpoint would be a good start and the option to select the model

@borzunov borzunov changed the title openai API? OpenAI-compatible API Aug 25, 2023
@jontstaz
Copy link

Any updates on this so far? Would be great to be able to use Petals as a drop in replacement for anything using OpenAI's API

@krrishdholakia
Copy link

@borzunov
Copy link
Member

Hi @apcameron @Eclipse-Station @jontstaz,

Can you share a few examples of apps where OpenAI-compatible API for Petals will be helpful? We hired a part-time dev who may work on this - it would be great to know some apps where we can test this.

@krrishdholakia, it seems like most people requesting this feature can't change the app code (e.g., to remap openai.ChatCompletion.create to LiteLLM). I'll double check that once people share the relevant examples of apps using OpenAI-compatible API endpoints.

@krrishdholakia
Copy link

krrishdholakia commented Sep 27, 2023

Hey @borzunov @jontstaz @apcameron

we actually put out a solution for this - https://docs.litellm.ai/docs/proxy_server

it's a 1-click local proxy, that spins up a local server to map openai completion calls to any litellm supported api (Petals, Huggingface TGI, TogetherAI, etc.)

Here's the cli command

litellm --model petals/petals-team/StableBeluga2

and it'll spin up an openai-compatiable proxy server at port: 8000.

Just set the openai api base to this and it'll start making petals calls

openai.api_base = "http://localhost:8000"

@krrishdholakia
Copy link

@borzunov let me know if that covers the use-case, if not happy to iterate and land on something that works for the community!

@apcameron
Copy link
Author

Here are some examples where it would be nice to use an OpenAI-compatible API to point to Petals instead.
https://github.com/paul-gauthier/aider
https://github.com/OpenBMB/ChatDev
https://github.com/AntonOsika/gpt-engineer
https://github.com/microsoft/autogen

@apcameron
Copy link
Author

Hey @borzunov @jontstaz @apcameron

we actually put out a solution for this - https://docs.litellm.ai/docs/proxy_server

it's a 1-click local proxy, that spins up a local server to map openai completion calls to any litellm supported api (Petals, Huggingface TGI, TogetherAI, etc.)

Here's the cli command

litellm --model petals/petals-team/StableBeluga2

and it'll spin up an openai-compatiable proxy server at port: 8000.

Just set the openai api base to this and it'll start making petals calls

openai.api_base = "http://localhost:8000"

Thank I will try this out in the next few days when I get some time

@krrishdholakia
Copy link

Added a tutorial for using the 1-click deploy with aider - https://docs.litellm.ai/docs/proxy_server#tutorial---using-with-aider

can do this with petals as well by running this instead of the hf command

litellm --model petals/petals-team/StableBeluga2

@jontstaz
Copy link

Hey @borzunov @jontstaz @apcameron

we actually put out a solution for this - https://docs.litellm.ai/docs/proxy_server

it's a 1-click local proxy, that spins up a local server to map openai completion calls to any litellm supported api (Petals, Huggingface TGI, TogetherAI, etc.)

Here's the cli command

litellm --model petals/petals-team/StableBeluga2

and it'll spin up an openai-compatiable proxy server at port: 8000.

Just set the openai api base to this and it'll start making petals calls

openai.api_base = "http://localhost:8000"

Perfect! This is exactly what I was looking for. Thanks.

Also FYI there's a new model which apparently performs better than CodeLlama and all other previous code-focused models. It's https://huggingface.co/TheBloke/Phind-CodeLlama-34B-v2-GPTQ

Would be cool to see it on Petals. I'm planning on getting a couple of 4090s relatively soon and then would be able to contribute to Petals with some code-focused models.

@softmix
Copy link

softmix commented Mar 29, 2024

#50 and #51 would solve this in a proper way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants