Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable streaming option in the OpenAI API server #480

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

adk9
Copy link

@adk9 adk9 commented May 16, 2024

Now that token streaming support has merged (#397), we can enable streaming response in the OpenAI RESTful API endpoint.

This PR

Running the Server

python -m mii.entrypoints.openai_api_server \
    --model "mistralai/Mistral-7B-Instruct-v0.1" \
    --port 3000 \
    --host 0.0.0.0

Client

from openai import OpenAI

client = OpenAI(
    base_url="http://ip:port/v1",
    api_key="test",
)

completion = client.chat.completions.create(
    model="mistralai/Mistral-7B-v0.1",
    messages=[
        {
            "role": "user",
            "content": "Tell me a joke.",
        },
    ],
    max_tokens=1024,
    stream=True
)

for chunk in completion:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Flask-RESTful
grpcio
grpcio-tools
Pillow
pydantic
pydantic==1.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @adk9 - we're trying to get the Pydantic update merged, hopefully very soon. So I was curious on what issues you ran into that you needed to pin this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @adk9 - we're trying to get the Pydantic update merged, hopefully very soon.

Oh great, thanks for the pointer! I was just starting to look into this myself.

So I was curious on what issues you ran into that you needed to pin this?

I see a 400 Bad Request error.

Error code: 400 - {'object': 'error', 'message': "[{'type': 'missing', 'loc': ('query', 'request'), 'msg': 'Field required', 'input': None}]", 'code': 40001}

I think the issue is with fastapi not fully supporting working with pydantic.v1 while having pydantic v2 installed (tiangolo/fastapi#10360). So the workaround is to either pin fastapi or pydantic to an older version, and I chose to do the latter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is openai compatible server still working?
2 participants