Skip to content

This repo is for load testing the FastChat OpenAI compatible REST API

Notifications You must be signed in to change notification settings

SDSU-Research-CI/fastchat-load-testing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FastChat API Load Testing

This repo is for load testing the FastChat OpenAI compatible REST API. Before giving users access to the API, we should have some understanding of the load that it can handle.

The python script fastchat-api-load-testing.py is executed via a Kubernetes indexed job. This allows the program to divide its input chat_prompts among several pods that are executed in parallel to simulate the desired load.

Setup

You should create a file env.yaml based on the provided env-template.yaml replacing the value of api_key with your FastChat API key.

You must then create a secret in your namespace from the env.yaml file like so:

kubectl -n [your-namespace] create secret generic fastchat-api --from-file=env=env.yaml

Usage

Begin by cloning this repo to your computer.

You can then run the indexed job in your namespace from the root directory of this repo:

kubectl -n [your-namespace] apply -f manifest.yaml

Monitoring

The FastChat deployment can be monitored for usage via the Nautilus Grafana dashboard for GPU usage by namespace, filtering by the namespace "sdsu-llm". The FastChat API server can also be monitored via logs by:

kubectl -n sdsu-llm logs -f [pod] -c fastchat-api

About

This repo is for load testing the FastChat OpenAI compatible REST API

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages