Releases: NVIDIA/GenerativeAIExamples
v0.6.0
This release adds ability to switch between API Catalog models and on-prem models using NIM-LLM and adds documentation on how to build an RAG application from scratch. It also releases a containerized end to end RAG evaluation application integrated with RAG chain-server APIs.
Added
- Ability to switch between API Catalog models to on-prem models using NIM-LLM.
- New API endpoint
/health
- Provides a health check for the chain server.
- Containerized evaluation application for RAG pipeline accuracy measurement.
- Observability support for langchain based examples.
- New Notebooks
- Added Chat with NVIDIA financial data notebook.
- Added notebook showcasing langgraph agent handling.
- A simple rag example template showcasing how to build an example from scratch.
Changed
- Renamed example
csv_rag
to structured_data_rag - Model Engine name update
nv-ai-foundation
andnv-api-catalog
llm engine are renamed tonvidia-ai-endpoints
nv-ai-foundation
embedding engine is renamed tonvidia-ai-endpoints
- Embedding model update
developer_rag
example uses UAE-Large-V1 embedding model.- Using
ai-embed-qa-4
for api catalog examples instead ofnvolveqa_40k
as embedding model
- Ingested data now persists across multiple sessions.
- Updated langchain-nvidia-endpoints to version 0.0.11, enabling support for models like llama3.
- File extension based validation to throw error for unsupported files.
- The default output token length in the UI has been increased from 250 to 1024 for more comprehensive responses.
- Stricter chain-server API validation support to enhance API security
- Updated version of llama-index, pymilvus.
- Updated pgvector container to
pgvector/pgvector:pg16
- LLM Model Updates
- Multiturn Chatbot now uses
ai-mixtral-8x7b-instruct
model for response generation. - Structured data rag now uses
ai-llama3-70b
for response and code generation.
- Multiturn Chatbot now uses
v0.5.0
This release adds new dedicated RAG examples showcasing state of the art usecases, switches to the latest API catalog endpoints from NVIDIA and also refactors the API interface of chain-server. This release also improves the developer experience by adding github pages based documentation and streamlining the example deployment flow using dedicated compose files.
Added
- Github pages based documentation.
- New examples showcasing
- Support for delete and list APIs in chain-server component
- Streamlined RAG example deployment
- Dedicated new docker compose files for every examples.
- Dedicated docker compose files for launching vector DB solutions.
- New configurations to control top k and confidence score of retrieval pipeline.
- Added a notebook which covers how to train SLMs with various techniques using NeMo Framework.
- Added more experimental examples showcasing new usecases.
- New dedicated notebook showcasing a RAG pipeline using web pages.
Changed
- Switched from NVIDIA AI Foundation to NVIDIA API Catalog endpoints for accessing cloud hosted LLM models.
- Refactored API schema of chain-server component to support runtime allocation of llm parameters like temperature, max tokens, chat history etc.
- Renamed
llm-playground
service in compose files torag-playground
. - Switched base containers for all components to ubuntu instead of pytorch and optimized container build time as well as container size.
- Deprecated yaml based configuration to avoid confusion, all configurations are now environment variable based.
- Removed requirement of hardcoding
NVIDIA_API_KEY
incompose.env
file. - Upgraded all python dependencies for chain-server and rag-playground services.
Fixed
- Fixed a bug causing hallucinated answer when retriever fails to return any documents.
- Fixed some accuracy issues for all the examples.
v0.4.0
This release adds new dedicated notebooks showcasing usage of cloud based NVIDIA AI Foundation models, upgraded milvus container version to enable GPU accelerated vector search and added support for FAISS vector database. Detailed changes are listed below:
Added
- New dedicated notebooks showcasing usage of cloud based Nvidia AI Foundation based models using Langchain connectors as well as local model deployment using Huggingface.
- Upgraded milvus container version to enable GPU accelerated vector search.
- Added support to interact with models behind NeMo Inference Microservices using new model engines
nemo-embed
andnemo-infer
. - Added support to provide example specific collection name for vector databases using an environment variable named
COLLECTION_NAME
. - Added
faiss
as a generic vector database solution behindutils.py
.
Changed
- Upgraded and changed base containers for all components to pytorch
23.12-py3
. - Added langchain specific vector database connector in
utils.py
. - Changed speech support to use single channel for Riva ASR and TTS.
- Changed
get_llm
utility inutils.py
to return Langchain wrapper instead of Llmaindex wrappers.
Fixed
- Fixed a bug causing empty rating in evaluation notebook
- Fixed document search implementation of query decomposition example.
v0.3.0
This release adds support for PGvector Vector DB, speech-in speech-out support using RIVA and RAG observability tooling. This release also adds a dedicated example for RAG pipeline using only models from NVIDIA AI Foundation and one example demonstrating query decomposition. Detailed changes are listed below:
Added
- New dedicated example showcasing Nvidia AI Playground based models using Langchain connectors.
- New example demonstrating query decomposition.
- Support for using PG Vector as a vector database in the developer rag canonical example.
- Support for using Speech-in Speech-out interface in the sample frontend leveraging RIVA Skills.
- New tool showcasing RAG observability support.
- Support for on-prem deployment of TRTLLM based nemotron models.
Changed
- Upgraded Langchain and llamaindex dependencies for all container.
- Restructured README files for better intuitiveness.
- Added provision to plug in multiple examples using a common base class.
- Changed
minio
service's port to9010
from9000
in docker based deployment. - Moved
evaluation
directory from top level to undertools
and created a dedicated compose file. - Added an experimental directory for plugging in experimental features.
- Modified notebooks to use TRTLLM and Nvidia AI foundation based connectors from langchain.
- Changed
ai-playground
model engine name tonv-ai-foundation
in configurations.
Fixed
Release v0.2.0
This release builds on the feedback received and brings many improvements, bugfixes and new features. This release is the first to include Nvidia AI Foundational models support and support for quantized LLM models. Detailed changes are listed below:
What's Added
- Support for using Nvidia AI Foundational LLM models
- Support for using Nvidia AI Foundational embedding models
- Support for deploying and using quantized LLM models
- Support for evaluating RAG pipeline
What's Changed
- Repository restructing to allow better open source contributions
- Upgraded dependencies for chain server container
- Upgraded NeMo Inference Framework container version, no seperate sign up needed now for access.
- Main README now provides more details.
- Documentation improvements.
- Better error handling and reporting mechanism for corner cases.
- Renamed triton-inference-server container and service to llm-inference-server
What's Fixed
v0.1.0
Bump postcss and next (#4) Bumps [postcss](https://github.com/postcss/postcss) to 8.4.31 and updates ancestor dependency [next](https://github.com/vercel/next.js). These dependencies need to be updated together. Updates `postcss` from 8.4.14 to 8.4.31 - [Release notes](https://github.com/postcss/postcss/releases) - [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md) - [Commits](https://github.com/postcss/postcss/compare/8.4.14...8.4.31) Updates `next` from 13.4.12 to 13.5.6 - [Release notes](https://github.com/vercel/next.js/releases) - [Changelog](https://github.com/vercel/next.js/blob/canary/release.js) - [Commits](https://github.com/vercel/next.js/compare/v13.4.12...v13.5.6) --- updated-dependencies: - dependency-name: postcss dependency-type: indirect - dependency-name: next dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>