Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile-Guided Optimization (PGO) results #185

Open
zamazan4ik opened this issue Jul 10, 2023 · 1 comment
Open

Profile-Guided Optimization (PGO) results #185

zamazan4ik opened this issue Jul 10, 2023 · 1 comment

Comments

@zamazan4ik
Copy link

zamazan4ik commented Jul 10, 2023

Hi!

I am doing a research of Profile-Guided Optimization (PGO) benefits on different software (results are here). I optimized drill with PGO too (via cargo-pgo) and want to share my results.

Test environment

  • Fedora 38
  • Linux kernel 6.3.7
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Rustc: 1.70.0
  • drill version: the latest main commit for now (dfd5548c8d4269d5fa8b73e81d616572e9a9d445)

Benchmark

As a benchmark, I used the server from example/server and drill with drill --benchmark benchmark.yml --stats (the only change to the benchmark.yml was iteration count - increased to 10000). I compared Drill in Release mode vs Drill in Release + PGO. As a profiling load (to collect a profile) the same load was used.

Results

Firstly, I want to highlight that methodology is not ideal since the CPU core is not overloaded so I measured the "average" CPU load by drill on one core (by htop) utility and checked with my eyes during every run (yeah, some scripting over top can be used here but right now I am quite lazy :). The lower the average CPU usage is - the better. This method could be improved but as a quick way - it should be good enough. All measurements were done on the same hardware/software, with the same "quiet" background load, multiple times, in different orders, etc - they are quite stable at least on my machine.

I show you results for "Release", "Release with PGO", and "Instrumentation" mode (Instrumentation just for history so you can estimate how Drill is slow in the Instrumentation mode):

  • Release: average CPU load is ~9.0 - 9.7% (less frequently 10.3%)
  • Release + PGO: average CPU load is ~7.8 - 8.4%
  • Instrumentation: average CPU load is ~15.5%

At least in this test, I see an improvement in Drill performance with PGO. If we can develop a way where Drill will be a CPU bottleneck itself in a "near real-life" case instead of NodeJS server - would be great to test it as well.

These results could be important for the persons who want to maximize benchmark tool performance per core/CPU/machine since it could help with postponing a moment when for benchmark purposes we need to spawn multiple machines to create a required stress load and/or just spawn cheaper instances to create the same load.

@zamazan4ik
Copy link
Author

Another example of optimizing a benchmark tool with PGO from Goose is here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant