Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling index search inside Triton python backend #7220

Open
riyaj8888 opened this issue May 15, 2024 · 1 comment
Open

Calling index search inside Triton python backend #7220

riyaj8888 opened this issue May 15, 2024 · 1 comment

Comments

@riyaj8888
Copy link

Right now we are having two models one for embedding and other for reranking.
For new query we generate embedding and for that embedding we do index search on other server.
After retrieval of few k documents from index search we pass these (query, took docs) to another Triton endpoint of reranking.

My question is can we integrate these all components together.
We can create ensemble of embedding+ reranking but how can we add indexing or search over index in this pipeline.

Thanks 🙏

@oandreeva-nv
Copy link
Contributor

Let me outline the process in the way I understand it, feel free to correct me.

For this task potentially you can either re-build an index based on the documents to re-use it, or de-serialize it from external service.

After that, one option is to write a python model and utilize a cuVS library. The latter one has apis to build an index, please check with their docs to see if it fits to your needs. This library also provides a variety of vector search algorithms to choose from as well as specifying k for top-k.

Then, the last step for this model is to combine initial request with retrieved top-k embedding and prepare a response, which will be passed to the next stage of your ensemble.

Let me know how this sound to you, happy to discuss further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants