Skip to content

This code example shows how to make a chatbot for semantic search over documents using Streamlit, LangChain, and various vector databases. The chatbot lets users ask questions and get answers from a document collection. The code is in Python and can be customized for different scenarios and data.

easonlai/chatbot_with_pdf_streamlit

Repository files navigation

Chatbot with PDF for Semantic Search over Documents (Build with Streamlit, LangChain, Pinecone/Chroma/Azure Cognitive Search)

This repository contains a code example for how to build an interactive chatbot for semantic search over documents. The chatbot allows users to ask natural language questions and get relevant answers from a collection of documents. The chatbot uses Streamlit for web and chatbot interface, LangChain, and leverages various types of vector databases, such as Pinecone, Chroma, and Azure Cognitive Search’s Vector Search, to perform efficient and accurate similarity search. The code is written in Python and can be easily modified to suit different use cases and data sources.

Please also check out my story in Medium (Streamlit and Vector Databases: A Guide to Creating Interactive Web Apps for Semantic Search over Documents) for more detail sharing.

  • preprocess_pinecone.ipynb <-- Example of using Embedding Model from Azure OpenAI Service to embed the content from the document and save it into Pinecone vector database.
  • preprocess_chroma.ipynb <-- Example of using Embedding Model from Azure OpenAI Service to embed the content from the document and save it into Chroma vector database.
  • preprocess_acs.ipynb <-- Example of using Embedding Model from Azure OpenAI Service to embed the content from the document and save it into Azure Cognitive Search vector database.
  • consume_pinecone.ipynb <-- Example of using LangChain question-answering module to perform similarity search from the Pinecone vector database and use the GPT-3.5 (text-davinci-003) to summarize the result.
  • consume_chroma.ipynb <-- Example of using LangChain question-answering module to perform similarity search from the Chroma vector database and use the GPT-3.5 (text-davinci-003) to summarize the result.
  • consume_acs.ipynb <-- Example of using LangChain question-answering module to perform similarity search from the Azure Cognitive Search vector database and use the GPT-3.5 (text-davinci-003) to summarize the result.
  • app_pinecone.py <-- Example of using Streamlit, LangChain, and Pinecone vector database to build an interactive chatbot to facilitate the semantic search over documents. It uses the GPT-3.5-Turbo model from Azure OpenAI Service for result summarization and chat.
  • app_chroma.py <-- Example of using Streamlit, LangChain, and Chroma vector database to build an interactive chatbot to facilitate the semantic search over documents. It uses the GPT-3.5-Turbo model from Azure OpenAI Service for result summarization and chat.
  • app_acs.py <-- Example of using Streamlit, LangChain, and Azure Cognitive Search vector database to build an interactive chatbot to facilitate the semantic search over documents. It uses the GPT-3.5-Turbo model from Azure OpenAI Service for result summarization and chat.

To run this Streamlit web app

streamlit run app_pinecone.py

High-level architecture and flow of this Semantic Search over Documents demo alt text

Enjoy!

About

This code example shows how to make a chatbot for semantic search over documents using Streamlit, LangChain, and various vector databases. The chatbot lets users ask questions and get answers from a document collection. The code is in Python and can be customized for different scenarios and data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published