Evaluate Profile-Guided Optimization (PGO) and LLVM BOLT

Hi!

Recently I did many Profile-Guided Optimization (PGO) benchmarks on multiple projects - the results are available [here](https://github.com/zamazan4ik/awesome-pgo). So that's why I think it's worth trying to apply PGO to `bat`. I already performed some benchmarks and want to share my results here.

## Test environment

* Fedora 38
* Linux kernel 6.5.5
* AMD Ryzen 9 5900x
* 48 Gib RAM
* SSD Samsung 980 Pro 2 Tib
* Compiler - Rustc 1.73
* `bat` version: the latest for now from the `master` branch on commit `fbe9b6f15fe64b4a5bde0478260dc67942731153`

## Benchmark setup

For the benchmark purpose, I use the scenario from https://github.com/sharkdp/bat/issues/2397 - `bat --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py`. For PGO profile collection the same arguments and test file were used. Release build is done with `cargo build --release`, PGO optimized build is done with [cargo-pgo](https://github.com/Kobzol/cargo-pgo).

All benchmarks are done multiple times, on the same hardware/software setup, with the same background "noise" (as much I can guarantee ofc).

## Results

I got the following results:
```
hyperfine --warmup 5 --min-runs 30 './bat_release --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py' './bat_optimized --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py'
Benchmark 1: ./bat_release --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py
  Time (mean ± σ):      1.169 s ±  0.058 s    [User: 1.131 s, System: 0.034 s]
  Range (min … max):    1.139 s …  1.465 s    30 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: ./bat_optimized --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py
  Time (mean ± σ):      1.107 s ±  0.011 s    [User: 1.069 s, System: 0.035 s]
  Range (min … max):    1.080 s …  1.135 s    30 runs

Summary
  ./bat_optimized --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py ran
    1.06 ± 0.05 times faster than ./bat_release --color=never --decorations=always --highlight-line=100000 --pager=never -- test.py
```

At least according to the simple benchmark above, PGO has a measurable positive effect on `bat` performance.

## Further steps

I can suggest the following things to do:
* Evaluate PGO's applicability to `bat` in more scenarios.
* If PGO helps to achieve better performance - add a note to bat's documentation about that (probably somewhere in the README file). In this case, users and maintainers will be aware of another optimization opportunity for bat.
* Provide PGO integration into the build scripts. It can help users and maintainers easily apply PGO for their own workloads.
* Optimize prebuilt binaries with PGO.

Here are some examples of how PGO is already integrated into other projects' build scripts:
* Rustc: a CI [script](https://github.com/rust-lang/rust/blob/master/src/ci/stage-build.py) for the multi-stage build
* GCC:
  - Official [docs](https://gcc.gnu.org/install/build.html), section "Building with profile feedback" (even AutoFDO build is supported)
  - A [part](https://github.com/gcc-mirror/gcc/blob/4832767db7897be6fb5cbc44f079482c90cb95a6/configure#L7818) in a "wonderful" `configure` script 
* Clang: [Docs](https://llvm.org/docs/HowToBuildWithPGO.html) 
* Python: 
  - CPython: [README](https://github.com/python/cpython#profile-guided-optimization)
  - Pyston: [README](https://github.com/pyston/pyston#building)
* Go: [Bash script](https://github.com/golang/go/blob/master/src/cmd/compile/profile.sh)
* V8: [Bazel flag](https://github.com/v8/v8/blob/main/BUILD.gn#L184)
* ChakraCore: [Scripts](https://github.com/chakra-core/ChakraCore/tree/master/Build/scripts/pgo)
* Chromium: [Script](https://chromium.googlesource.com/chromium/src/build/config/+/refs/heads/main/compiler/pgo/BUILD.gn)
* Firefox: [Docs](https://firefox-source-docs.mozilla.org/build/buildsystem/pgo.html)
   - Thunderbird has PGO support too
* PHP - [Makefile command](https://github.com/php/php-src/blob/master/build/Makefile.global#L138) and old Centminmod [scripts](https://github.com/centminmod/php_pgo_training_scripts)
* MySQL: [CMake script](https://github.com/mysql/mysql-server/blob/8.0/cmake/fprofile.cmake)
* YugabyteDB: [GitHub commit](https://github.com/yugabyte/yugabyte-db/commit/34cb791ed9d3d5f8ae9a9b9e9181a46485e1981d)
* FoundationDB: [Script](https://github.com/apple/foundationdb/blob/1a6114a66f3de508c0cf0a45f72f3687ba05750c/contrib/generate_profile.sh)
* Zstd: [Makefile](https://github.com/facebook/zstd/blob/dev/programs/Makefile#L232)
* [Foot](https://codeberg.org/dnkl/foot): [Scripts](https://codeberg.org/dnkl/foot/src/branch/master/pgo)
* Windows Terminal: [GitHub PR](https://github.com/microsoft/terminal/pull/10071)
* Pydantic-core: [GitHub PR](https://github.com/pydantic/pydantic-core/pull/741)

After PGO, I can suggest evaluating [LLVM BOLT](https://github.com/llvm/llvm-project/blob/main/bolt/README.md) as an additional optimization step after PGO.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Evaluate Profile-Guided Optimization (PGO) and LLVM BOLT #2701

Test environment

Benchmark setup

Results

Further steps

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Evaluate Profile-Guided Optimization (PGO) and LLVM BOLT #2701

Description

Test environment

Benchmark setup

Results

Further steps

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions