Evaluate Profile-Guided Optimization (PGO) usage #1294

zamazan4ik · 2024-04-09T15:37:24Z

zamazan4ik
Apr 9, 2024

Hi!

Recently I checked optimizations like Profile-Guided Optimization (PGO) and Post-Link Optimizations (PLO) improvements on multiple projects. The results are available here. According to the tests, all these optimizations can achieve better performance in many cases for many applications. I think trying to enable them for the project could be a good idea since the project cares about performance (according to the README file). I already did some preliminary benchmarks and want to share my results here. Hopefully, they will be helpful.

Test environment

Fedora 39
Linux kernel 6.8.4
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler: Rustc 1.76
martin version: the latest for now from the main branch on commit 59040a8686fbaa0390d6f0e8903774dab75fc320
Disabled Turbo boost (for better results consistency across runs)

Benchmark

For benchmark purposes, I use built-in benchmarks. Release bench is done with cargo bench, PGO instrumentation - with cargo pgo bench, PGO optimization - with cargo pgo optimize bench. cargo-pgo is used for performing all PGO-related routines. As a PGO training workload, the same benchmark was used.

All tests are done on the same machine, done multiple times (results are the same), with the same background "noise" (as much as I can guarantee of course) - the results are reproducible at least on my machine.

Results

Here are the results:

Release: https://gist.github.com/zamazan4ik/e138b62657eafea280fc25730136094a
PGO-optimized compared to Release: https://gist.github.com/zamazan4ik/7593915078f93b865c5d264d61fcc423
(just for reference) PGO-instrumented compared to Release: https://gist.github.com/zamazan4ik/223401aaca562ff184453a0cb5d24474

At least to the simple benchmarks above, PGO measurably improves the library's performance. However, I understand that these benchmarks can be too synthetic.

Further steps

I can suggest the following action points:

Perform more PGO benchmarks on the project in different various scenarios. If it shows improvements - add a note to the documentation (README file?) about possible improvements in the library performance with PGO.

Here are some examples of how PGO optimization is integrated into other projects:

Rustc: a CI script for the multi-stage build
GCC:
- Official docs, section "Building with profile feedback" (even AutoFDO build is supported)
- A part in a "wonderful" configure script
Clang: Docs
Python:
- CPython: README
- Pyston: README
Go: Bash script
V8: Bazel flag
ChakraCore: Scripts
Chromium: Script
Firefox: Docs
- Thunderbird has PGO support too
PHP - Makefile command and old Centminmod scripts
MySQL: CMake script
YugabyteDB: GitHub commit
FoundationDB: Script
Zstd: Makefile
Foot: Scripts
Windows Terminal: GitHub PR
Pydantic-core: GitHub PR
file.d: GitHub PR
OceanBase: CMake flag

Please, do not treat the discussion like a bug or smth like that. It's just a benchmark report with possible improvement ideas for the project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate Profile-Guided Optimization (PGO) usage #1294

{{title}}

Replies: 0 comments

Select a reply

Evaluate Profile-Guided Optimization (PGO) usage #1294

zamazan4ik Apr 9, 2024

Test environment

Benchmark

Results

Further steps

Replies: 0 comments

zamazan4ik
Apr 9, 2024