Add precaution again running v1 endpoints on openai models #3694

grandbora · 2024-05-16T19:25:24Z

What this PR does

Adds error handling for executing V1 endpoints on OpenAI models.

Kserve allows users to send V1 endpoint requests to OpenAI models. This leads to a crash on the server side. We should gracefully fail on these requests.

Before the PR kserve was returning a 500 error to the user. After the PR kserve is returning a 400 error to the user.

Logs

500 err before the PR:

curl -v \
    -H "Content-Type: application/json" \
    $ENDPOINT \
    -d \
    '{...}'
*   Trying 0.0.0.0:8080...
* Connected to 0.0.0.0 (127.0.0.1) port 8080
> POST /v1/models/local-test-completion:predict HTTP/1.1
> Host: 0.0.0.0:8080
> User-Agent: curl/8.4.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 329
> 
< HTTP/1.1 500 Internal Server Error
< date: Thu, 16 May 2024 19:37:54 GMT
< server: uvicorn
< content-length: 71
< content-type: application/json
< 
* Connection #0 to host 0.0.0.0 left intact
{"error":"TypeError : 'CompletionsTransformer' object is not callable"}%

400 err after the PR:

curl -v \
    -H "Content-Type: application/json" \
    $ENDPOINT \
    -d \
    '{...}'
*   Trying 0.0.0.0:8080...
* Connected to 0.0.0.0 (127.0.0.1) port 8080
> POST /v1/models/local-test-completion:predict HTTP/1.1
> Host: 0.0.0.0:8080
> User-Agent: curl/8.4.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 329
> 
< HTTP/1.1 400 Bad Request
< date: Thu, 16 May 2024 19:35:56 GMT
< server: uvicorn
< content-length: 97
< content-type: application/json
< 
* Connection #0 to host 0.0.0.0 left intact
{"error":"Model local-test-completion is of type OpenAIModel. It does not support infer method."}%

Future Work

PS1. Ideally server should return a 404.
PS2. The vice versa is still not handled gracefully. Sending an openai request to a non openai model creates a 500.
Happy to address both of the above in follow up PRs.

Type of changes

Bug fix (non-breaking change which fixes an issue)

Re-running failed tests

/rerun-all - rerun all failed workflows.
/rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

grandbora · 2024-05-16T19:53:16Z

cc @cmaddalozzo @yuzisun

yuzisun · 2024-05-19T15:53:20Z

We can choose to not register v1 endpoints for the model server, but there could be edge cases that people create both KServeModel and OpenAIModel which need both predictive and generative endpoints, so changing to 400 is ok but let's fix the other case calling KServeModel via openai endpoints to be consistent.

grandbora · 2024-05-20T13:04:22Z

I will carve some time to handle the opposite case. In the spirit of small incremental improvements I'd suggest we merge this, unless there is any other feedback.

python/kserve/kserve/protocol/dataplane.py

Signed-off-by: grandbora <[email protected]>

grandbora · 2024-05-20T18:52:45Z

Hi @yuzisun , I removed the check from explain. Though I have these reservations:

If we want to allow a user to use the /v1/models/{model_name}:explain path on an openai model then they will need to implement the explain logic under the __call__ method. Openai model doesn't come with any of these.
By leaving this code unchecked, any user can make the server crash with an http request. This is not good practice, it can lead to other issues.

If we want to support this I think the model should come with a stub that returns a not implemented error by default.

grandbora · 2024-05-20T21:32:05Z

/rerun-all

yuzisun · 2024-05-21T10:28:28Z

Hi @yuzisun , I removed the check from explain. Though I have these reservations:

If we want to allow a user to use the /v1/models/{model_name}:explain path on an openai model then they will need to implement the explain logic under the __call__ method. Openai model doesn't come with any of these.

By leaving this code unchecked, any user can make the server crash with an http request. This is not good practice, it can lead to other issues.

If we want to support this I think the model should come with a stub that returns a not implemented error by default.

ok let's log an issue and address in a separate PR

Signed-off-by: grandbora <[email protected]>

grandbora · 2024-05-21T20:42:01Z

Added a warning log to explain method. @yuzisun this is back to you.

grandbora · 2024-05-24T16:23:22Z

Bumping this. @yuzisun whenever you get a chance, please take a look.

yuzisun · 2024-05-27T18:15:11Z

/lgtm
/approve

oss-prow-bot · 2024-05-27T18:15:16Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: grandbora, yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~python/kserve/OWNERS~~ [yuzisun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

oss-prow-bot bot added the do-not-merge/work-in-progress label May 16, 2024

oss-prow-bot bot requested review from rachitchauhan43 and sivanantha321 May 16, 2024 19:25

grandbora force-pushed the openaierrors branch from 8f295cd to 86c12b6 Compare May 16, 2024 19:51

grandbora marked this pull request as ready for review May 16, 2024 19:52

oss-prow-bot bot removed the do-not-merge/work-in-progress label May 16, 2024

oss-prow-bot bot requested review from cmaddalozzo and lizzzcai May 16, 2024 19:52

yuzisun reviewed May 20, 2024

View reviewed changes

python/kserve/kserve/protocol/dataplane.py Outdated Show resolved Hide resolved

grandbora force-pushed the openaierrors branch from c4e88b4 to 7125ae1 Compare May 20, 2024 18:49

grandbora added 2 commits May 20, 2024 14:49

Add precaution again running v1 endpoints on openai models

1489477

Signed-off-by: grandbora <[email protected]>

Remove the check from explain

4df3d7e

Signed-off-by: grandbora <[email protected]>

grandbora force-pushed the openaierrors branch from 7125ae1 to 4df3d7e Compare May 20, 2024 18:49

Add a warning log for explain

ea64919

Signed-off-by: grandbora <[email protected]>

oss-prow-bot bot assigned yuzisun May 27, 2024

oss-prow-bot bot added the lgtm label May 27, 2024

oss-prow-bot bot added the approved label May 27, 2024

yuzisun merged commit 04c41c2 into kserve:master May 27, 2024
57 of 58 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add precaution again running v1 endpoints on openai models #3694

Add precaution again running v1 endpoints on openai models #3694

grandbora commented May 16, 2024 •

edited

grandbora commented May 16, 2024

yuzisun commented May 19, 2024

grandbora commented May 20, 2024

grandbora commented May 20, 2024 •

edited

grandbora commented May 20, 2024

yuzisun commented May 21, 2024

grandbora commented May 21, 2024

grandbora commented May 24, 2024

yuzisun commented May 27, 2024

oss-prow-bot bot commented May 27, 2024

Add precaution again running v1 endpoints on openai models #3694

Add precaution again running v1 endpoints on openai models #3694

Conversation

grandbora commented May 16, 2024 • edited

What this PR does

Logs

Future Work

grandbora commented May 16, 2024

yuzisun commented May 19, 2024

grandbora commented May 20, 2024

grandbora commented May 20, 2024 • edited

grandbora commented May 20, 2024

yuzisun commented May 21, 2024

grandbora commented May 21, 2024

grandbora commented May 24, 2024

yuzisun commented May 27, 2024

oss-prow-bot bot commented May 27, 2024

grandbora commented May 16, 2024 •

edited

grandbora commented May 20, 2024 •

edited