Profile-Guided Optimization (PGO) results #185

zamazan4ik · 2023-07-10T02:09:12Z

Hi!

I am doing a research of Profile-Guided Optimization (PGO) benefits on different software (results are here). I optimized drill with PGO too (via cargo-pgo) and want to share my results.

Test environment

Fedora 38
Linux kernel 6.3.7
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Rustc: 1.70.0
drill version: the latest main commit for now (dfd5548c8d4269d5fa8b73e81d616572e9a9d445)

Benchmark

As a benchmark, I used the server from example/server and drill with drill --benchmark benchmark.yml --stats (the only change to the benchmark.yml was iteration count - increased to 10000). I compared Drill in Release mode vs Drill in Release + PGO. As a profiling load (to collect a profile) the same load was used.

Results

Firstly, I want to highlight that methodology is not ideal since the CPU core is not overloaded so I measured the "average" CPU load by drill on one core (by htop) utility and checked with my eyes during every run (yeah, some scripting over top can be used here but right now I am quite lazy :). The lower the average CPU usage is - the better. This method could be improved but as a quick way - it should be good enough. All measurements were done on the same hardware/software, with the same "quiet" background load, multiple times, in different orders, etc - they are quite stable at least on my machine.

I show you results for "Release", "Release with PGO", and "Instrumentation" mode (Instrumentation just for history so you can estimate how Drill is slow in the Instrumentation mode):

Release: average CPU load is ~9.0 - 9.7% (less frequently 10.3%)
Release + PGO: average CPU load is ~7.8 - 8.4%
Instrumentation: average CPU load is ~15.5%

At least in this test, I see an improvement in Drill performance with PGO. If we can develop a way where Drill will be a CPU bottleneck itself in a "near real-life" case instead of NodeJS server - would be great to test it as well.

These results could be important for the persons who want to maximize benchmark tool performance per core/CPU/machine since it could help with postponing a moment when for benchmark purposes we need to spawn multiple machines to create a required stress load and/or just spawn cheaper instances to create the same load.

The text was updated successfully, but these errors were encountered:

zamazan4ik · 2023-07-10T19:05:15Z

Another example of optimizing a benchmark tool with PGO from Goose is here.

zamazan4ik mentioned this issue Jul 10, 2023

Profile-Guided Optimization (PGO) improvements hatoo/oha#264

Closed

zamazan4ik mentioned this issue Sep 12, 2023

Evaluate Profile-Guided Optimization (PGO) and LLVM BOLT JoeDog/siege#227

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile-Guided Optimization (PGO) results #185

Profile-Guided Optimization (PGO) results #185

zamazan4ik commented Jul 10, 2023 •

edited

zamazan4ik commented Jul 10, 2023

Profile-Guided Optimization (PGO) results #185

Profile-Guided Optimization (PGO) results #185

Comments

zamazan4ik commented Jul 10, 2023 • edited

Test environment

Benchmark

Results

zamazan4ik commented Jul 10, 2023

zamazan4ik commented Jul 10, 2023 •

edited