-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile-Guided Optimization (PGO) results #185
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi!
I am doing a research of Profile-Guided Optimization (PGO) benefits on different software (results are here). I optimized
drill
with PGO too (via cargo-pgo) and want to share my results.Test environment
main
commit for now (dfd5548c8d4269d5fa8b73e81d616572e9a9d445
)Benchmark
As a benchmark, I used the server from
example/server
anddrill
withdrill --benchmark benchmark.yml --stats
(the only change to thebenchmark.yml
was iteration count - increased to 10000). I compared Drill in Release mode vs Drill in Release + PGO. As a profiling load (to collect a profile) the same load was used.Results
Firstly, I want to highlight that methodology is not ideal since the CPU core is not overloaded so I measured the "average" CPU load by
drill
on one core (byhtop
) utility and checked with my eyes during every run (yeah, some scripting overtop
can be used here but right now I am quite lazy :). The lower the average CPU usage is - the better. This method could be improved but as a quick way - it should be good enough. All measurements were done on the same hardware/software, with the same "quiet" background load, multiple times, in different orders, etc - they are quite stable at least on my machine.I show you results for "Release", "Release with PGO", and "Instrumentation" mode (Instrumentation just for history so you can estimate how Drill is slow in the Instrumentation mode):
~9.0 - 9.7% (less frequently 10.3%)
~7.8 - 8.4%
~15.5%
At least in this test, I see an improvement in Drill performance with PGO. If we can develop a way where Drill will be a CPU bottleneck itself in a "near real-life" case instead of NodeJS server - would be great to test it as well.
These results could be important for the persons who want to maximize benchmark tool performance per core/CPU/machine since it could help with postponing a moment when for benchmark purposes we need to spawn multiple machines to create a required stress load and/or just spawn cheaper instances to create the same load.
The text was updated successfully, but these errors were encountered: