-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is onnxruntime-genai supported? #7182
Comments
@jackylu0124 Support for onnxruntime-genai is currently work in progress - the python bindings should work within the python backend - but we haven't had a chance to test that ourselves yet. That being said we are actively investigating support - can you share more about your use case / timeline needed for support? |
Hi @nnshah1 , thank you very much for your fast reply! By "the python bindings should work within the python backend", you meant that I can do things like My use case is mainly for serving LLM models, where some of which are in the form of ONNX models that depend on Also a follow-up question: regarding serving LLM, what would be the best backend for serving and achieving token streaming outside of using the TensorRT-LLM backend? Thanks! |
Hey all, I have a quick question, is onnxruntime-genai (https://onnxruntime.ai/docs/genai/api/python.html) supported in Triton Inference Server's ONNX runtime backend? I couldn't find relevant sources in the documentation. Thanks!
The text was updated successfully, but these errors were encountered: