Streaming Support for Nvidia's Triton Integration #13135

Rohith-2 · 2024-04-27T17:02:45Z

Description

Implemented the streaming capabilities for Nvidia's Triton integration in llama-index

Fixes # (issue)

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

Yes
No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

Yes
No

Type of Change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Added new unit/integration tests
Added new notebook (that tests end-to-end)
I stared at the code and made sure it makes sense

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran make format; make lint to appease the lint gods

Implemented Streaming capabilities for completion and chatting

Added examples for streaming the generation from LLM's complete functionality.

Updated example to utilise the streaming capability of Triton with complete and chat functions.

review-notebook-app · 2024-04-27T17:02:50Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

logan-markewich

coo! This looks good to me thanks.

Reformatted with lint

Rohith-2 added 5 commits April 27, 2024 21:44

Update base.py

c312329

Implemented Streaming capabilities for completion and chatting

Update nvidia_triton.ipynb

a32f9c2

Added examples for streaming the generation from LLM's complete functionality.

Update nvidia_triton.ipynb

7b84550

Updated example to utilise the streaming capability of Triton with complete and chat functions.

Update nvidia_triton.ipynb

4d89dfd

Updated example to utilise the streaming capability of Triton with complete and chat functions.

Update pyproject.toml

93bdef8

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Apr 27, 2024

Rohith-2 changed the title ~~Patch 3~~ Streaming Support for Nvidia's Triton Integration Apr 27, 2024

Added Tests

2d2891a

logan-markewich approved these changes Apr 28, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Apr 28, 2024

Rohith-2 added 2 commits April 28, 2024 09:21

Update base.py

a0f62ee

Reformatted with lint

Merge branch 'run-llama:main' into patch-3

4f8fabc

logan-markewich merged commit 93cb095 into run-llama:main Apr 30, 2024
8 checks passed

JuHyung-Son pushed a commit to UpstageAI/llama_index that referenced this pull request May 1, 2024

Streaming Support for Nvidia's Triton Integration (run-llama#13135)

c8d6809

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming Support for Nvidia's Triton Integration #13135

Streaming Support for Nvidia's Triton Integration #13135

Rohith-2 commented Apr 27, 2024 •

edited

review-notebook-app bot commented Apr 27, 2024

logan-markewich left a comment

Streaming Support for Nvidia's Triton Integration #13135

Streaming Support for Nvidia's Triton Integration #13135

Conversation

Rohith-2 commented Apr 27, 2024 • edited

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

review-notebook-app bot commented Apr 27, 2024

logan-markewich left a comment

Choose a reason for hiding this comment

Rohith-2 commented Apr 27, 2024 •

edited