-
Notifications
You must be signed in to change notification settings - Fork 978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add precaution again running v1 endpoints on openai models #3694
Conversation
We can choose to not register v1 endpoints for the model server, but there could be edge cases that people create both |
I will carve some time to handle the opposite case. In the spirit of small incremental improvements I'd suggest we merge this, unless there is any other feedback. |
Signed-off-by: grandbora <[email protected]>
Signed-off-by: grandbora <[email protected]>
Hi @yuzisun , I removed the check from explain. Though I have these reservations:
If we want to support this I think the model should come with a stub that returns a |
/rerun-all |
ok let's log an issue and address in a separate PR |
Signed-off-by: grandbora <[email protected]>
Added a warning log to |
Bumping this. @yuzisun whenever you get a chance, please take a look. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: grandbora, yuzisun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does
Adds error handling for executing V1 endpoints on OpenAI models.
Kserve allows users to send V1 endpoint requests to OpenAI models. This leads to a crash on the server side. We should gracefully fail on these requests.
Before the PR kserve was returning a
500
error to the user. After the PR kserve is returning a400
error to the user.Logs
500 err before the PR:
400 err after the PR:
Future Work
PS1. Ideally server should return a 404.
PS2. The vice versa is still not handled gracefully. Sending an openai request to a non openai model creates a 500.
Happy to address both of the above in follow up PRs.
Type of changes
Re-running failed tests
/rerun-all
- rerun all failed workflows./rerun-workflow <workflow name>
- rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.