Evaluate using additional optimizations like LTO and PGO

Hi!

As I have done many times before, I decided to test the Profile-Guided Optimization (PGO) technique to optimize the application's performance. For reference, results for other projects are available at https://github.com/zamazan4ik/awesome-pgo . Since PGO helped a lot for many other apps, I decided to apply it to the project to see if the performance win can be achieved. Here are my benchmark results.

This information can be interesting for anyone who wants to achieve more performance with the library in their use cases.

## Test environment

* Fedora 40
* Linux kernel 6.10.7
* AMD Ryzen 9 5900x
* 48 Gib RAM
* SSD Samsung 980 Pro 2 Tib
* Compiler - Rustc 1.79.0
* `tex-fmt` version: `main` branch on commit `f2689ac7e2c713cfb6106220c09a44141770a638`
* Disabled Turbo boost

## Benchmark

For benchmark purposes, I use built-in into the project [benchmarks](https://github.com/WGUNDERWOOD/tex-fmt/blob/main/extra/perf.sh). For PGO optimization I use [cargo-pgo](https://github.com/Kobzol/cargo-pgo) tool. For all measurements I used the same command but with different binaries - `taskset -c 0 tex_fmt tests/source/* tests/target/*`.

`taskset -c 0` is used for reducing the OS scheduler's influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).

## Results

I got the following results in `hyperfine`'s format:

```
hyperfine --warmup 25 --min-runs 100 --prepare "cp -r ../tests/* tests" "taskset -c 0 ./tex_fmt_release tests/source/* tests/target/*" "taskset -c 0 ./tex_fmt_lto tests/source/* tests/target/*" "taskset -c 0 ./tex_fmt_optimized tests/source/* tests/target/*" "taskset -c 0 ./tex_fmt_instrumented tests/source/* tests/target/*"

Benchmark 1: taskset -c 0 ./tex_fmt_release tests/source/* tests/target/*
  Time (mean ± σ):      92.3 ms ±   1.2 ms    [User: 72.6 ms, System: 8.5 ms]
  Range (min … max):    90.6 ms …  98.6 ms    100 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.

Benchmark 2: taskset -c 0 ./tex_fmt_lto tests/source/* tests/target/*
  Time (mean ± σ):      87.3 ms ±   1.0 ms    [User: 67.5 ms, System: 8.6 ms]
  Range (min … max):    85.5 ms …  91.1 ms    100 runs

Benchmark 3: taskset -c 0 ./tex_fmt_optimized tests/source/* tests/target/*
  Time (mean ± σ):      80.1 ms ±   0.6 ms    [User: 60.2 ms, System: 9.1 ms]
  Range (min … max):    78.3 ms …  81.2 ms    100 runs

Benchmark 4: taskset -c 0 ./tex_fmt_instrumented tests/source/* tests/target/*
  Time (mean ± σ):     133.0 ms ±   1.6 ms    [User: 110.6 ms, System: 9.8 ms]
  Range (min … max):   131.0 ms … 139.4 ms    100 runs

Summary
  taskset -c 0 ./tex_fmt_optimized tests/source/* tests/target/* ran
    1.09 ± 0.01 times faster than taskset -c 0 ./tex_fmt_lto tests/source/* tests/target/*
    1.15 ± 0.02 times faster than taskset -c 0 ./tex_fmt_release tests/source/* tests/target/*
    1.66 ± 0.02 times faster than taskset -c 0 ./tex_fmt_instrumented tests/source/* tests/target/*
```
where (with binary size information - it's important for some cases too):
* `tex_fmt_release` - default Release profile, 2.6 Mib
* `tex_fmt_lto` - default Release profile + LTO, 2.4 Mib
* `tex_fmt_optimized` - default Release profile + LTO + PGO optimized, 2.4 Mib
* `tex_fmt_instrumented` - default Release profile + LTO + PGO instrumented, 4.5 Mib

According to the results, LTO and PGO measurably improve the application's performance.

## Further steps

As a first easy step, I suggest enabling LTO only for the Release builds so as not to sacrifice the developers' experience while working on the project since LTO consumes an additional amount of time to finish the compilation routine. If you think that a regular Release build should not be affected by such a change as well, then I suggest adding an additional `release-lto` profile where additionally to regular `release` optimizations LTO also will be added. Such a change simplifies life for maintainers and others interested in the project persons who want to build the most performant version of the application. Using ThinLTO also should help).

Also, Post-Link Optimization (PLO) can be tested after PGO. It can be done by applying tools like LLVM BOLT to `tex-fmt`.

Thank you.

P.S. It's just an idea, not an actual issue. Possibly, Ideas in GitHub's Discussions is a better place to discuss such proposals.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluate using additional optimizations like LTO and PGO #22

Test environment

Benchmark

Results

Further steps

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Evaluate using additional optimizations like LTO and PGO #22

Description

Test environment

Benchmark

Results

Further steps

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions