A low-budget Generalist AI that allows users to ask about Medical domain, moreover, users can provide images instead of text. By carefully selecting models, our AI agent can work effectively with just 8GB of RAM on GCP's Compute Engine.
LLaMa-2-7B-GGUF is the heart of the assistance. This Language Model can offer beneficial responses when users inquire about illnesses, symptoms, etc. llama-cpp-python is a friendly interface to deploy 4-bit
quantization model that uses only ~3.8GB
of RAM.
Despite LLM's strength, they appear to be "blind" to the visual data. A retrieval-base model called BiomedCLIP-PubMedBert helps the LLM process visual data. This uses roughly ~1GB
of RAM.
An interactive chatbot user interface (UI) is created using Gradio, which enables simple interactions with users. To connect the UI with these aforcehead models, Fast API is used, which uses Streaming Response to generate tokens sequentially like ChatGPT.
- Install dependencies
pip install -r requirements.txt
- Download LLaMa2 GGUF as Language Model
Create ckpt
folder to store checkpoints
mkdir ckpt
Download the model by using huggingface-cli
from Huggingface hub. Find more LLaMa2-GGUF at here
huggingface-cli download TheBloke/Llama-2-7B-GGUF llama-2-7b.Q4_K_S.gguf --local-dir ./ckpt --local-dir-use-symlinks False
- Run LLM api which can receive and reponse text
python3 llm.py
- Run CLIP api which can receive image as input. This model supports 'blind' LLM.
python3 clip.py
- Run Gradio web UI
gradio agent.py