You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently I checked optimizations like Profile-Guided Optimization (PGO) and Post-Link Optimizations (PLO) improvements on multiple projects. The results are available here. According to the tests, all these optimizations can achieve better performance in many cases for many applications. I think trying to enable them for the project could be a good idea. I already did some benchmarks and want to share my results here. Hopefully, they will be helpful.
Test environment
Fedora 39
Linux kernel 6.8.4
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler: Rustc 1.76
nucleo version: the latest for now from the master branch on commit a82a24999b899e588a73da830d3a6957f0fbea2b
Disabled Turbo boost (for better results consistency across runs)
Benchmark
For benchmark purposes, I use this workload. Release build is done with cargo build --release, PGO instrumentation - with cargo pgo build, PGO optimization - with cargo pgo optimize build. cargo-pgo is used for performing all PGO-related routines. The training scenario is running the whole program once.
All tests are done on the same machine, done multiple times (results are the same), with the same background "noise" (as much as I can guarantee of course) - the results are reproducible at least on my machine. taskset -c 0 is used for better stability across runs (to reduce OS scheduler influence).
At least to the simple benchmarks above, PGO measurably improves the library's performance.
Further steps
I can suggest the following action points:
Perform more PGO benchmarks on the project. If it shows improvements - add a note to the documentation (README file?) about possible improvements in the library performance with PGO.
Test more advanced approaches like Post-Link Optimization (PLO) with tools like LLVM BOLT
Here are some examples of how PGO optimization is integrated into other projects:
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi!
Recently I checked optimizations like Profile-Guided Optimization (PGO) and Post-Link Optimizations (PLO) improvements on multiple projects. The results are available here. According to the tests, all these optimizations can achieve better performance in many cases for many applications. I think trying to enable them for the project could be a good idea. I already did some benchmarks and want to share my results here. Hopefully, they will be helpful.
Test environment
master
branch on commita82a24999b899e588a73da830d3a6957f0fbea2b
Benchmark
For benchmark purposes, I use this workload. Release build is done with
cargo build --release
, PGO instrumentation - withcargo pgo build
, PGO optimization - withcargo pgo optimize build
. cargo-pgo is used for performing all PGO-related routines. The training scenario is running the whole program once.All tests are done on the same machine, done multiple times (results are the same), with the same background "noise" (as much as I can guarantee of course) - the results are reproducible at least on my machine.
taskset -c 0
is used for better stability across runs (to reduce OS scheduler influence).Results
Here are the results:
At least to the simple benchmarks above, PGO measurably improves the library's performance.
Further steps
I can suggest the following action points:
Here are some examples of how PGO optimization is integrated into other projects:
configure
scriptI have some examples of how PGO information looks in the documentation:
Please, do not treat the discussion like a bug or smth like that. It's just a benchmark report with possible improvement ideas for the project.
Beta Was this translation helpful? Give feedback.
All reactions