Skip to content

rafaelvp-db/databricks-llm-prompt-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Large Language Models (LLMs) & Prompt Engineering with Hugging Face, Databricks and MLflow

hf hf pt db mlflow

Contents

  • The repo covers different use cases related to Prompt Engineering and Large Language Models (LLMs).
  • Exploration & Exploitation: the repo contains notebooks for LLM experimentation with different Prompt Engineering techniques. It also showcases LLM deployment using Databricks Model Serving with GPU Support.
  • This repo also ships with a demo frontend application developed using Gradio

As of 29/08/2023, you will find the following examples in the notebooks folder:

🙋🏻‍♂️ customer_service

Artifact Description
hf_mlflow_crash_course 🤓 Provides a basic example using Hugging Face for training an intent classification model using distilbert-qa. Also showcases foundational concepts of MLflow, such as experiment tracking, artifact logging and model registration.
primer 🎬 Mostly conceptual notebook. Contains explanations around Prompt Engineering, and foundational concepts such as Top K sampling, Top p sampling and Temperature.
basic_prompt_evaluation 🧪 Demonstrates basic Prompt Engineeering with lightweight LLM models. In addition to this, showcases MLflow's newest LLM features, such as mlflow.evaluate().
few_shot_learning 💉 Here we explore Few Shot Learning with an Instruction Based LLM (mpt-7b-instruct).
active_prompting 🏃🏻‍♂️ In this notebook, we explore active prompting techniques. Additionally, we demonstrate how to leverage VLLM in order to achieve 7X - 10X inference latency improvements.
llama2_mlflow_logging_inference 🚀 Here we show how to log, register and deploy a LLaMA V2 model into MLflow
mpt_mlflow_logging_inference 🚀 Here we show how to log, register and deploy an MPT-Instruct model into MLflow. Differently from the LLaMA V2 example, here we load model weights directly into the model serving endpoint when the endpoint is initialized, without uploading the artifacts into MLflow Model Registry.
frontend 🎨 End-to-end example of a frontend demo app which connects to one of the Model Serving Endpoints deployed in the previous notebook using Gradio

Getting Started

To start using this repo on Databricks, there are a few pre-requirements:

  1. Create a GPU Cluster, minimally with Databricks Machine Learning Runtime 13.2 GPU and an NVIDIA T4 GPU (either A10 or A100 is required for the steps involving VLLM).
  2. (only if using Databricks MLR < 13.2) Install CUDA additional dependencies
  3. (only if using MPT models) Install the following Python packages in your cluster:
accelerate==0.21.0
einops==0.6.1
flash-attn==v1.0.5
ninja
tokenizers==0.13.3
transformers==4.30.2
xformers==0.0.20
  1. Once all dependencies finish installing and your cluster has successfully started, you should be good to go.

Roadmap

🎨 Frontend Web App Using Gradio
🚀 Model Deployment and Real Time Inference
🔎 Retrieval Augmented Generation (RAG)
🛣️ MLflow AI Gateway

Credits & Reference