A GPT-3.5 & GPT-4 Workload Trace to Optimize LLM Serving Systems

This repository contains public releases of a real-world trace dataset of LLM serving workloads for the benefit of the research and academic community.

This LLM serving is powered by Microsoft Azure.

There are currently two files in /data:

BurstGPT.csv contains all of our trace in 2 month with some failure that Response tokens are 0s. Totally 1429.7k lines.
BurstGPT_without_fails.csv contains all of our trace in 2 month without failure. Totally 1404.3k lines.

Usage

You may scale the RPS in the trace according to your evaluation setups.
You may also model the patterns in the trace as indicated in our paper and scale the parameters in the models.
If you have some specific needs, we are eager to assist you in exploring and leveraging the trace to its fullest potential. Please let us know of any issues or questions by sending email to mailing list.

Future Plans

We will continue to update the time range of the trace and add the end time of each request.
We will update the conversation log, including the question IDs, time stamps, etc, in each conversation, for researchers to optimize conversation services.
We will open-source the full benchmark suite for LLM inference soon.

Paper

If the trace is utilized in your research, please ensure to reference our paper:

@misc{wang2024efficient,
      title={Towards Efficient and Reliable LLM Serving: A Real-World Workload Study}, 
      author={Yuxin Wang and Yuhan Chen and Zeyu Li and Zhenheng Tang and Rui Guo and Xin Wang and Qiang Wang and Amelie Chi Zhou and Xiaowen Chu},
      year={2024},
      eprint={2401.17644},
      archivePrefix={arXiv},
      primaryClass={cs.DC}
}

Main characteristics

Duration: 61 consecutive days in 2 consecutive months.
Dataset size: 1.4m lines, ~50MB.

Schema

Timestamp: request submission time, seconds from 0:00:00 on the first day.
Model: called models, including ChatGPT and GPT-4.
Request tokens: Request tokens length.
Response tokens: Response tokens length.
Total tokens: Request tokens length plus response tokens length.
Log Type: the way users call the model, in conversation mode or using API, including Conversation log and API log.

Data Overview

Figure 1: Weekly Periodicity in BurstGPT.

Figure 2: Daily Periodicity in BurstGPT.

Figure 3: Average Daily Request and Response Throughput in BurstGPT.

Figure 4: Statistics of Request and Response Tokens in BurstGPT.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
img		img
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

img

img

CITATION.cff

CITATION.cff

LICENSE

LICENSE

README.md

README.md

Repository files navigation

A GPT-3.5 & GPT-4 Workload Trace to Optimize LLM Serving Systems

Usage

Future Plans

Paper

Main characteristics

Schema

Data Overview

About

Releases 1

Contributors 2

License

HPMLL/BurstGPT

Folders and files

Latest commit

History

Repository files navigation

A GPT-3.5 & GPT-4 Workload Trace to Optimize LLM Serving Systems

Usage

Future Plans

Paper

Main characteristics

Schema

Data Overview

About

Topics

Resources

License

Stars

Watchers

Forks