FastChat API Load Testing

This repo is for load testing the FastChat OpenAI compatible REST API. Before giving users access to the API, we should have some understanding of the load that it can handle.

The python script fastchat-api-load-testing.py is executed via a Kubernetes indexed job. This allows the program to divide its input chat_prompts among several pods that are executed in parallel to simulate the desired load.

Setup

You should create a file env.yaml based on the provided env-template.yaml replacing the value of api_key with your FastChat API key.

You must then create a secret in your namespace from the env.yaml file like so:

kubectl -n [your-namespace] create secret generic fastchat-api --from-file=env=env.yaml

Usage

Begin by cloning this repo to your computer.

You can then run the indexed job in your namespace from the root directory of this repo:

kubectl -n [your-namespace] apply -f manifest.yaml

Monitoring

The FastChat deployment can be monitored for usage via the Nautilus Grafana dashboard for GPU usage by namespace, filtering by the namespace "sdsu-llm". The FastChat API server can also be monitored via logs by:

kubectl -n sdsu-llm logs -f [pod] -c fastchat-api

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
env-template.yaml		env-template.yaml
fastchat-api-load-testing.py		fastchat-api-load-testing.py
manifest.yaml		manifest.yaml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

env-template.yaml

env-template.yaml