Skip to content

tien02/low-budget-generalist-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Low-Cost Generalist AI for Medical assistance

A low-budget Generalist AI that allows users to ask about Medical domain, moreover, users can provide images instead of text. By carefully selecting models, our AI agent can work effectively with just 8GB of RAM on GCP's Compute Engine.

LLaMa-2-7B-GGUF is the heart of the assistance. This Language Model can offer beneficial responses when users inquire about illnesses, symptoms, etc. llama-cpp-python is a friendly interface to deploy 4-bit quantization model that uses only ~3.8GB of RAM.

Despite LLM's strength, they appear to be "blind" to the visual data. A retrieval-base model called BiomedCLIP-PubMedBert helps the LLM process visual data. This uses roughly ~1GB of RAM.

An interactive chatbot user interface (UI) is created using Gradio, which enables simple interactions with users. To connect the UI with these aforcehead models, Fast API is used, which uses Streaming Response to generate tokens sequentially like ChatGPT.

demo

Setup

demo
  1. Install dependencies
pip install -r requirements.txt
  1. Download LLaMa2 GGUF as Language Model

Create ckpt folder to store checkpoints

mkdir ckpt

Download the model by using huggingface-cli from Huggingface hub. Find more LLaMa2-GGUF at here

huggingface-cli download TheBloke/Llama-2-7B-GGUF llama-2-7b.Q4_K_S.gguf --local-dir ./ckpt --local-dir-use-symlinks False

Deploy the service

  1. Run LLM api which can receive and reponse text
python3 llm.py
  1. Run CLIP api which can receive image as input. This model supports 'blind' LLM.
python3 clip.py
  1. Run Gradio web UI
gradio agent.py

Releases

No releases published

Packages

No packages published

Languages