You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I did a lot of Profile-Guided Optimization (PGO) benchmarks recently on different kinds of software - all currently available results are located at https://github.com/zamazan4ik/awesome-pgo . According to the tests, PGO usually helps with achieving better performance. That's why testing PGO would be a good idea for fd as well. I did some benchmarks on my local machine and want to share my results.
Test environment
Apple Macbook M1 (full charge, AC connected)
macOS 13.4 Ventura
Rust: 1.72
Latest fd from the master branch (commit 3884f054f19603b64aef3f2898f6125e15599229 )
Test workload
As a test scenario, I used https://github.com/sharkdp/fd-benchmarks/blob/master/cold-cache-simple-pattern.sh. All runs are performed on the same hardware, operating system, and the same background workload (as much as I can guarantee). The measurements were performed with hyperfine. The PGO optimization is done with cargo-pgo. Warmup for hyperfine is 3, min-runs is 5.
Results
Here are the results (PGO-optimized binary (first) compared to Release binary (second)):
./cold-cache-simple-pattern.sh
This script will now ask for your password in order to gain root/sudo
permissions. These are required to reset the harddisk caches in between
benchmark runs.
Okay, acquired superpowers :-)
Benchmark 1: /Users/zamazan4ik/open_source/fd/target/aarch64-apple-darwin/release/fd -HI '.*[0-9]\.jpg$' '/Users/zamazan4ik'
Time (mean ± σ): 4.696 s ± 0.055 s [User: 2.421 s, System: 12.064 s]
Range (min … max): 4.646 s … 4.762 s 5 runs
Benchmark 2: /Users/zamazan4ik/open_source/fd/target/release/fd -HI '.*[0-9]\.jpg$' '/Users/zamazan4ik'
Time (mean ± σ): 4.751 s ± 0.043 s [User: 2.632 s, System: 12.345 s]
Range (min … max): 4.677 s … 4.784 s 5 runs
Summary
/Users/zamazan4ik/open_source/fd/target/aarch64-apple-darwin/release/fd -HI '.*[0-9]\.jpg$' '/Users/zamazan4ik' ran
1.01 ± 0.02 times faster than /Users/zamazan4ik/open_source/fd/target/release/fd -HI '.*[0-9]\.jpg$' '/Users/zamazan4ik'
Here I want to highlight the consistent improvement to the User time. So, PGO makes some small improvements at least in the project's benchmark, and could be worth integrating into the project's build scripts.
Possible further steps
I can suggest to do the following things:
Add a note to the fd documentation (maybe somewhere in the README file) about building with PGO if you think it's worth it for the project. In this case, users and maintainers who build their own fd binaries will be aware of PGO as an additional way to optimize the project
Try to use LLVM BOLT in addition to PGO. However, I do not expect huge improvements from BOLT in this project
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi!
I did a lot of Profile-Guided Optimization (PGO) benchmarks recently on different kinds of software - all currently available results are located at https://github.com/zamazan4ik/awesome-pgo . According to the tests, PGO usually helps with achieving better performance. That's why testing PGO would be a good idea for
fd
as well. I did some benchmarks on my local machine and want to share my results.Test environment
fd
from themaster
branch (commit3884f054f19603b64aef3f2898f6125e15599229
)Test workload
As a test scenario, I used https://github.com/sharkdp/fd-benchmarks/blob/master/cold-cache-simple-pattern.sh. All runs are performed on the same hardware, operating system, and the same background workload (as much as I can guarantee). The measurements were performed with
hyperfine
. The PGO optimization is done with cargo-pgo. Warmup forhyperfine
is 3,min-runs
is 5.Results
Here are the results (PGO-optimized binary (first) compared to Release binary (second)):
Here I want to highlight the consistent improvement to the
User
time. So, PGO makes some small improvements at least in the project's benchmark, and could be worth integrating into the project's build scripts.Possible further steps
I can suggest to do the following things:
fd
documentation (maybe somewhere in the README file) about building with PGO if you think it's worth it for the project. In this case, users and maintainers who build their ownfd
binaries will be aware of PGO as an additional way to optimize the projectBeta Was this translation helpful? Give feedback.
All reactions