Description
Hi!
Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. E.g. PGO results for LLVM-related tooling are here. According to the tests, PGO usually helps with the compiler and compiler-like workloads (like static analysis) - e.g. Clang gets +20% compilation speed with PGO. That's why I think trying to optimize grcov with PGO can be a good idea.
I already did some benchmarks and want to share my results.
Test environment
- Fedora 38
- Linux kernel 6.5.6
- AMD Ryzen 9 5900x
- 48 Gib RAM
- SSD Samsung 980 Pro 2 Tib
- Compiler - Rustc 1.73
- grcov version: the latest for now from the
master
branch on commit322fc39acacd75aca0ff1c0a1ec2a3e91f04011c
- Disabled Turbo boost
Benchmark
For benchmark purposes, I use grcov benchmarks via cargo +nightly bench
. For PGO optimization I use cargo-pgo tool. The same benchmark suite was used for the PGO training phase via cargo +nightly pgo bench
. PGO optimized results I got with cargo +nightly pgo optimize bench
.
I met a small issue with benchmarks #1127 but quickly fixed it by using precision = 2
(found a similar parameter somewhere in the repository).
Results
I got the following results:
- Release: https://gist.github.com/zamazan4ik/b35115ae404d9583c57fce19ce5a3821
- Release + PGO optimized: https://gist.github.com/zamazan4ik/121853dcf1baa3bf486717c042b68042
- (just for reference) Release + PGO training: https://gist.github.com/zamazan4ik/bf6321e490337c6d51a61c4306fc5e07
At least according to the provided by grcov project benchmarks, PGO helps with optimizing performance.
Further steps
I can suggest the following action points:
- Perform more PGO benchmarks on grcov. And if it shows improvements - add a note about possible improvements in grcov's tool performance with PGO.
- Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize grcov according to their own workloads.
- Optimize pre-built binaries
Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.
Here are some examples of how PGO optimization is integrated in other projects:
- Rustc: a CI script for the multi-stage build
- GCC:
- Clang: Docs
- Python:
- Go: Bash script
- V8: Bazel flag
- ChakraCore: Scripts
- Chromium: Script
- Firefox: Docs
- Thunderbird has PGO support too
- PHP - Makefile command and old Centminmod scripts
- MySQL: CMake script
- YugabyteDB: GitHub commit
- FoundationDB: Script
- Zstd: Makefile
- Foot: Scripts
- Windows Terminal: GitHub PR
- Pydantic-core: GitHub PR
- file.d: GitHub PR
- OceanBase: CMake flag