Replies: 1 comment 1 reply
-
can you explain what decoupled mode mean? also feel free to write the question in Chinese. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Sorry I'm not an English speaker, so forgive my poor English.
I want to use Triton as our model inference server and vLLM as a backend. But since vLLM's triton backend only support decoupled mode. Is it possible to implement none-decoupled mode myself? Is there anything I should be aware of?
Beta Was this translation helpful? Give feedback.
All reactions