Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add precaution again running v1 endpoints on openai models #3694

Merged
merged 3 commits into from
May 27, 2024

Conversation

grandbora
Copy link
Contributor

@grandbora grandbora commented May 16, 2024

What this PR does

Adds error handling for executing V1 endpoints on OpenAI models.

Kserve allows users to send V1 endpoint requests to OpenAI models. This leads to a crash on the server side. We should gracefully fail on these requests.

Before the PR kserve was returning a 500 error to the user. After the PR kserve is returning a 400 error to the user.

Logs

500 err before the PR:

curl -v \
    -H "Content-Type: application/json" \
    $ENDPOINT \
    -d \
    '{...}'
*   Trying 0.0.0.0:8080...
* Connected to 0.0.0.0 (127.0.0.1) port 8080
> POST /v1/models/local-test-completion:predict HTTP/1.1
> Host: 0.0.0.0:8080
> User-Agent: curl/8.4.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 329
> 
< HTTP/1.1 500 Internal Server Error
< date: Thu, 16 May 2024 19:37:54 GMT
< server: uvicorn
< content-length: 71
< content-type: application/json
< 
* Connection #0 to host 0.0.0.0 left intact
{"error":"TypeError : 'CompletionsTransformer' object is not callable"}%   

400 err after the PR:

curl -v \
    -H "Content-Type: application/json" \
    $ENDPOINT \
    -d \
    '{...}'
*   Trying 0.0.0.0:8080...
* Connected to 0.0.0.0 (127.0.0.1) port 8080
> POST /v1/models/local-test-completion:predict HTTP/1.1
> Host: 0.0.0.0:8080
> User-Agent: curl/8.4.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 329
> 
< HTTP/1.1 400 Bad Request
< date: Thu, 16 May 2024 19:35:56 GMT
< server: uvicorn
< content-length: 97
< content-type: application/json
< 
* Connection #0 to host 0.0.0.0 left intact
{"error":"Model local-test-completion is of type OpenAIModel. It does not support infer method."}%                                                                                                                                                                                         

Future Work

PS1. Ideally server should return a 404.
PS2. The vice versa is still not handled gracefully. Sending an openai request to a non openai model creates a 500.
Happy to address both of the above in follow up PRs.

Type of changes

  • Bug fix (non-breaking change which fixes an issue)

Re-running failed tests

  • /rerun-all - rerun all failed workflows.
  • /rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

@grandbora
Copy link
Contributor Author

cc @cmaddalozzo @yuzisun

@yuzisun
Copy link
Member

yuzisun commented May 19, 2024

We can choose to not register v1 endpoints for the model server, but there could be edge cases that people create both KServeModel and OpenAIModel which need both predictive and generative endpoints, so changing to 400 is ok but let's fix the other case calling KServeModel via openai endpoints to be consistent.

@grandbora
Copy link
Contributor Author

I will carve some time to handle the opposite case. In the spirit of small incremental improvements I'd suggest we merge this, unless there is any other feedback.

@grandbora
Copy link
Contributor Author

grandbora commented May 20, 2024

Hi @yuzisun , I removed the check from explain. Though I have these reservations:

  • If we want to allow a user to use the /v1/models/{model_name}:explain path on an openai model then they will need to implement the explain logic under the __call__ method. Openai model doesn't come with any of these.

  • By leaving this code unchecked, any user can make the server crash with an http request. This is not good practice, it can lead to other issues.

If we want to support this I think the model should come with a stub that returns a not implemented error by default.

@grandbora
Copy link
Contributor Author

/rerun-all

@yuzisun
Copy link
Member

yuzisun commented May 21, 2024

Hi @yuzisun , I removed the check from explain. Though I have these reservations:

  • If we want to allow a user to use the /v1/models/{model_name}:explain path on an openai model then they will need to implement the explain logic under the __call__ method. Openai model doesn't come with any of these.
  • By leaving this code unchecked, any user can make the server crash with an http request. This is not good practice, it can lead to other issues.

If we want to support this I think the model should come with a stub that returns a not implemented error by default.

ok let's log an issue and address in a separate PR

@grandbora
Copy link
Contributor Author

Added a warning log to explain method. @yuzisun this is back to you.

@grandbora
Copy link
Contributor Author

Bumping this. @yuzisun whenever you get a chance, please take a look.

@yuzisun
Copy link
Member

yuzisun commented May 27, 2024

/lgtm
/approve

Copy link

oss-prow-bot bot commented May 27, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: grandbora, yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yuzisun yuzisun merged commit 04c41c2 into kserve:master May 27, 2024
57 of 58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants