Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile-Guided Optimization (PGO) benchmark report #3456

Open
zamazan4ik opened this issue Apr 4, 2024 · 0 comments
Open

Profile-Guided Optimization (PGO) benchmark report #3456

zamazan4ik opened this issue Apr 4, 2024 · 0 comments
Labels
enhancement New feature or request speedup Performance bugs, speed improvements unrelated to 1.0 Things that need not be done before the 1.0 version milestone

Comments

@zamazan4ik
Copy link

Hi!

Recently I checked optimizations like Profile-Guided Optimization (PGO) and Post-Link Optimizations (PLO) improvements on multiple projects. The results are available here. According to the tests, all these optimizations can help with achieving better performance in many cases for many applications. I think trying to enable them for libjxl can be a good idea. I read an article on Phoronix about a new JPEG encoding/decoding library - Jpegli - and decided to optimize it with PGO.

I already did some benchmarks and want to share my results here. Hopefully, they will be helpful.

Test environment

  • Fedora 39
  • Linux kernel 6.7.6
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Clang 17.0.6
  • libjxl version: the latest for now from the main branch on commit 680d0e38683b6485e39807772c579252fe91f3a4
  • Disabled Turbo boost (for better results consistency across runs)

Benchmark

I didn't find a good benchmark suite to evaluate performance gains on a large dataset. Instead, I use these image samples. In all cases, an image for 30 Mib is used. In all cases, the library is configured with cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF -DENABLE_JPEGLI_DEFAULT=ON ... For the PGO training phase, additional flag -fprofile-generate is passed to the compiler, for the PGO optimization phase - -fprofile-use flag. The PGO training phase is done with the following command: cjpegli Sample-png-image-30mb.png converted.jpeg -q 90, where cjpegli - Jpegli's encoder, Sample-png-image-30mb.png - an input image.

All tests are done on the same machine, done multiple times, with the same background "noise" (as much as I can guarantee of course) - the results are reproducible at least on my machine. taskset -c 0 is used for better stability across runs (to reduce OS scheduler influence).

Results

Here are the results:

Also, I tested the case when training and actual workloads differ. Here are the PGO optimized compared to a regular release benchmark, when another sample image is used (not the same as during the training phase): https://gist.github.com/zamazan4ik/4750fa6424a53e83638f4ab422f901a9

At least to the simple benchmarks above, PGO allows achieving better performance.

Further steps

I can suggest the following action points:

  • Perform more PGO benchmarks on libjxl. If it shows improvements - add a note to the documentation about possible improvements in libjxl performance with PGO.
  • Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize libjxl according to their workloads.
  • Optimize pre-built libjxl binaries (if any)

Here are some examples of how PGO optimization is integrated into other projects:

I have some examples of how PGO information looks in the documentation:

Please, do not treat the issue like a bug or smth like that. It's just a benchmark report with possible improvement idea for the project.

@mo271 mo271 added enhancement New feature or request speedup Performance bugs, speed improvements unrelated to 1.0 Things that need not be done before the 1.0 version milestone labels Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request speedup Performance bugs, speed improvements unrelated to 1.0 Things that need not be done before the 1.0 version milestone
Projects
None yet
Development

No branches or pull requests

2 participants