Profile-Guided Optimization (PGO) evaluation results #1384

zamazan4ik · 2023-09-12T02:38:05Z

zamazan4ik
Sep 12, 2023

Hi!

I did a lot of Profile-Guided Optimization (PGO) benchmarks recently on different kinds of software - all currently available results are located at https://github.com/zamazan4ik/awesome-pgo . According to the tests, PGO usually helps with achieving better performance. That's why testing PGO would be a good idea for fd as well. I did some benchmarks on my local machine and want to share my results.

Test environment

Apple Macbook M1 (full charge, AC connected)
macOS 13.4 Ventura
Rust: 1.72
Latest fd from the master branch (commit 3884f054f19603b64aef3f2898f6125e15599229 )

Test workload

As a test scenario, I used https://github.com/sharkdp/fd-benchmarks/blob/master/cold-cache-simple-pattern.sh. All runs are performed on the same hardware, operating system, and the same background workload (as much as I can guarantee). The measurements were performed with hyperfine. The PGO optimization is done with cargo-pgo. Warmup for hyperfine is 3, min-runs is 5.

Results

Here are the results (PGO-optimized binary (first) compared to Release binary (second)):

./cold-cache-simple-pattern.sh
This script will now ask for your password in order to gain root/sudo
permissions. These are required to reset the harddisk caches in between
benchmark runs.

Okay, acquired superpowers :-)

Benchmark 1: /Users/zamazan4ik/open_source/fd/target/aarch64-apple-darwin/release/fd -HI '.*[0-9]\.jpg$' '/Users/zamazan4ik'
  Time (mean ± σ):      4.696 s ±  0.055 s    [User: 2.421 s, System: 12.064 s]
  Range (min … max):    4.646 s …  4.762 s    5 runs

Benchmark 2: /Users/zamazan4ik/open_source/fd/target/release/fd -HI '.*[0-9]\.jpg$' '/Users/zamazan4ik'
  Time (mean ± σ):      4.751 s ±  0.043 s    [User: 2.632 s, System: 12.345 s]
  Range (min … max):    4.677 s …  4.784 s    5 runs

Summary
  /Users/zamazan4ik/open_source/fd/target/aarch64-apple-darwin/release/fd -HI '.*[0-9]\.jpg$' '/Users/zamazan4ik' ran
    1.01 ± 0.02 times faster than /Users/zamazan4ik/open_source/fd/target/release/fd -HI '.*[0-9]\.jpg$' '/Users/zamazan4ik'

Here I want to highlight the consistent improvement to the User time. So, PGO makes some small improvements at least in the project's benchmark, and could be worth integrating into the project's build scripts.

Possible further steps

I can suggest to do the following things:

Add a note to the fd documentation (maybe somewhere in the README file) about building with PGO if you think it's worth it for the project. In this case, users and maintainers who build their own fd binaries will be aware of PGO as an additional way to optimize the project
Try to use LLVM BOLT in addition to PGO. However, I do not expect huge improvements from BOLT in this project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile-Guided Optimization (PGO) evaluation results #1384

{{title}}

Replies: 0 comments

Select a reply

Profile-Guided Optimization (PGO) evaluation results #1384

zamazan4ik Sep 12, 2023

Test environment

Test workload

Results

Possible further steps

Replies: 0 comments

zamazan4ik
Sep 12, 2023