Skip to content

[TraceQL] Performance increase for simple queries#5247

Merged
mdisibio merged 2 commits intografana:mainfrom
mdisibio:less-collection
Jun 11, 2025
Merged

[TraceQL] Performance increase for simple queries#5247
mdisibio merged 2 commits intografana:mainfrom
mdisibio:less-collection

Conversation

@mdisibio
Copy link
Copy Markdown
Contributor

@mdisibio mdisibio commented Jun 9, 2025

What this PR does:
Batch/Instrumentation/Trace collectors were doing a lot of work to iterate/append when they didn't need to. This has the most impact for metrics queries that process a lot of spans.

Benchmarks for Metrics
                                                                                                          │ before.txt  │              after.txt              │
                                                                                                          │   sec/op    │   sec/op     vs base                │
BackendBlockQueryRange/{}_|_rate()/5-14                                                                     83.41m ± 0%   73.54m ± 5%  -11.83% (p=0.000 n=10)
BackendBlockQueryRange/{}_|_rate()_by_(span.http.status_code)/5-14                                          193.4m ± 1%   173.6m ± 1%  -10.23% (p=0.000 n=10)
BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/5-14                                          116.7m ± 0%   108.5m ± 0%   -7.08% (p=0.000 n=10)
BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/5-14                                                  1.477m ± 0%   1.462m ± 0%   -1.03% (p=0.001 n=10)
BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/5-14                                8.622m ± 0%   8.372m ± 0%   -2.90% (p=0.000 n=10)
BackendBlockQueryRange/{span.http.host_!=_``_&&_span.http.flavor=`2`}_|_rate()_by_(span.http.flavor)/5-14   12.20m ± 0%   12.08m ± 0%   -0.94% (p=0.000 n=10)
BackendBlockQueryRange/{status=error}_|_rate()/5-14                                                         8.826m ± 0%   8.811m ± 0%        ~ (p=0.190 n=10)
BackendBlockQueryRange/{}_|_quantile_over_time(duration,_.99,_.9,_.5)/5-14                                  172.0m ± 0%   161.9m ± 4%   -5.89% (p=0.002 n=10)
BackendBlockQueryRange/{}_|_quantile_over_time(duration,_.99)_by_(span.http.status_code)/5-14               278.0m ± 1%   262.1m ± 3%   -5.73% (p=0.002 n=10)
BackendBlockQueryRange/{}_|_histogram_over_time(duration)/5-14                                              171.8m ± 0%   163.4m ± 1%   -4.89% (p=0.000 n=10)
BackendBlockQueryRange/{}_|_avg_over_time(duration)_by_(span.http.status_code)/5-14                         227.9m ± 0%   211.8m ± 2%   -7.04% (p=0.000 n=10)
BackendBlockQueryRange/{}_|_max_over_time(duration)_by_(span.http.status_code)/5-14                         231.0m ± 0%   214.6m ± 2%   -7.10% (p=0.000 n=10)
BackendBlockQueryRange/{}_|_min_over_time(duration)_by_(span.http.status_code)/5-14                         231.2m ± 0%   213.6m ± 0%   -7.61% (p=0.000 n=10)
BackendBlockQueryRange/{_name_!=_nil_}_|_compare({status=error})/5-14                                        2.526 ± 1%    2.526 ± 2%        ~ (p=0.912 n=10)
BackendBlockQueryRange/{}_>_{}_|_rate()_by_(name)/5-14                                                      340.8m ± 0%   327.1m ± 0%   -4.00% (p=0.000 n=10)
geomean                                                                                                     90.46m        85.79m        -5.16%

                                                                                                          │ before.txt │              after.txt              │
                                                                                                          │  MB_IO/op  │  MB_IO/op   vs base                 │
BackendBlockQueryRange/{}_|_rate()/5-14                                                                     2.922 ± 0%   2.922 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{}_|_rate()_by_(span.http.status_code)/5-14                                          3.073 ± 0%   3.073 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/5-14                                          2.961 ± 0%   2.960 ± 0%  -0.03% (p=0.000 n=10)
BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/5-14                                                  1.438 ± 0%   1.438 ± 0%       ~ (p=1.000 n=10)
BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/5-14                                2.954 ± 0%   2.954 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{span.http.host_!=_``_&&_span.http.flavor=`2`}_|_rate()_by_(span.http.flavor)/5-14   3.470 ± 0%   3.470 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{status=error}_|_rate()/5-14                                                         3.078 ± 0%   3.078 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{}_|_quantile_over_time(duration,_.99,_.9,_.5)/5-14                                  5.555 ± 0%   5.553 ± 0%  -0.04% (p=0.001 n=10)
BackendBlockQueryRange/{}_|_quantile_over_time(duration,_.99)_by_(span.http.status_code)/5-14               5.705 ± 0%   5.705 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{}_|_histogram_over_time(duration)/5-14                                              5.555 ± 0%   5.553 ± 0%  -0.04% (p=0.000 n=10)
BackendBlockQueryRange/{}_|_avg_over_time(duration)_by_(span.http.status_code)/5-14                         5.702 ± 0%   5.702 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{}_|_max_over_time(duration)_by_(span.http.status_code)/5-14                         5.702 ± 0%   5.702 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{}_|_min_over_time(duration)_by_(span.http.status_code)/5-14                         5.702 ± 0%   5.702 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{_name_!=_nil_}_|_compare({status=error})/5-14                                       19.60 ± 0%   19.60 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{}_>_{}_|_rate()_by_(name)/5-14                                                      5.906 ± 0%   5.901 ± 0%  -0.08% (p=0.000 n=10)
geomean                                                                                                     4.405        4.404       -0.01%
¹ all samples are equal

                                                                                                          │  before.txt   │              after.txt               │
                                                                                                          │   spans/op    │  spans/op    vs base                 │
BackendBlockQueryRange/{}_|_rate()/5-14                                                                     819.6k ± 0%     819.6k ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{}_|_rate()_by_(span.http.status_code)/5-14                                          819.6k ± 0%     819.6k ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/5-14                                          819.6k ± 0%     819.6k ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/5-14                                                   400.6 ± 1%      378.8 ± 1%  -5.45% (p=0.000 n=10)
BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/5-14                                21.92k ± 0%     21.92k ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{span.http.host_!=_``_&&_span.http.flavor=`2`}_|_rate()_by_(span.http.flavor)/5-14    0.000 ± 0%      0.000 ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{status=error}_|_rate()/5-14                                                         3.231k ± 0%     3.231k ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{}_|_quantile_over_time(duration,_.99,_.9,_.5)/5-14                                  819.6k ± 0%     819.6k ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{}_|_quantile_over_time(duration,_.99)_by_(span.http.status_code)/5-14               819.6k ± 0%     819.6k ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{}_|_histogram_over_time(duration)/5-14                                              819.6k ± 0%     819.6k ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{}_|_avg_over_time(duration)_by_(span.http.status_code)/5-14                         819.6k ± 0%     819.6k ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{}_|_max_over_time(duration)_by_(span.http.status_code)/5-14                         819.6k ± 0%     819.6k ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{}_|_min_over_time(duration)_by_(span.http.status_code)/5-14                         819.6k ± 0%     819.6k ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{_name_!=_nil_}_|_compare({status=error})/5-14                                       819.6k ± 0%     819.6k ± 0%       ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{}_>_{}_|_rate()_by_(name)/5-14                                                      512.6k ± 0%     512.6k ± 0%       ~ (p=1.000 n=10) ¹
geomean                                                                                                                 ²                -0.37%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                                                                                          │  before.txt   │               after.txt                │
                                                                                                          │    spans/s    │   spans/s     vs base                  │
BackendBlockQueryRange/{}_|_rate()/5-14                                                                     9.827M ± 0%     11.146M ± 5%  +13.42% (p=0.000 n=10)
BackendBlockQueryRange/{}_|_rate()_by_(span.http.status_code)/5-14                                          4.238M ± 1%      4.721M ± 1%  +11.40% (p=0.000 n=10)
BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/5-14                                          7.022M ± 0%      7.557M ± 0%   +7.62% (p=0.000 n=10)
BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/5-14                                                  271.0k ± 1%      259.6k ± 2%   -4.21% (p=0.000 n=10)
BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/5-14                                2.542M ± 0%      2.618M ± 0%   +2.98% (p=0.000 n=10)
BackendBlockQueryRange/{span.http.host_!=_``_&&_span.http.flavor=`2`}_|_rate()_by_(span.http.flavor)/5-14    0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
BackendBlockQueryRange/{status=error}_|_rate()/5-14                                                         366.1k ± 0%      366.7k ± 0%        ~ (p=0.190 n=10)
BackendBlockQueryRange/{}_|_quantile_over_time(duration,_.99,_.9,_.5)/5-14                                  4.766M ± 0%      5.064M ± 4%   +6.26% (p=0.002 n=10)
BackendBlockQueryRange/{}_|_quantile_over_time(duration,_.99)_by_(span.http.status_code)/5-14               2.948M ± 1%      3.128M ± 3%   +6.08% (p=0.002 n=10)
BackendBlockQueryRange/{}_|_histogram_over_time(duration)/5-14                                              4.771M ± 0%      5.016M ± 1%   +5.15% (p=0.000 n=10)
BackendBlockQueryRange/{}_|_avg_over_time(duration)_by_(span.http.status_code)/5-14                         3.597M ± 0%      3.870M ± 2%   +7.58% (p=0.000 n=10)
BackendBlockQueryRange/{}_|_max_over_time(duration)_by_(span.http.status_code)/5-14                         3.548M ± 0%      3.819M ± 2%   +7.64% (p=0.000 n=10)
BackendBlockQueryRange/{}_|_min_over_time(duration)_by_(span.http.status_code)/5-14                         3.546M ± 0%      3.838M ± 0%   +8.24% (p=0.000 n=10)
BackendBlockQueryRange/{_name_!=_nil_}_|_compare({status=error})/5-14                                       324.4k ± 1%      324.5k ± 2%        ~ (p=0.912 n=10)
BackendBlockQueryRange/{}_>_{}_|_rate()_by_(name)/5-14                                                      1.504M ± 0%      1.567M ± 0%   +4.17% (p=0.000 n=10)
geomean                                                                                                                 ²                  +5.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                                                                                          │  before.txt   │              after.txt               │
                                                                                                          │     B/op      │     B/op       vs base               │
BackendBlockQueryRange/{}_|_rate()/5-14                                                                     2.884Mi ± 20%   3.425Mi ± 22%       ~ (p=0.143 n=10)
BackendBlockQueryRange/{}_|_rate()_by_(span.http.status_code)/5-14                                          24.38Mi ±  4%   24.38Mi ±  5%       ~ (p=0.971 n=10)
BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/5-14                                          2.296Mi ± 37%   2.346Mi ± 15%       ~ (p=0.912 n=10)
BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/5-14                                                  2.748Mi ±  3%   2.707Mi ±  4%       ~ (p=0.436 n=10)
BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/5-14                                4.219Mi ±  1%   4.196Mi ±  2%       ~ (p=0.481 n=10)
BackendBlockQueryRange/{span.http.host_!=_``_&&_span.http.flavor=`2`}_|_rate()_by_(span.http.flavor)/5-14   1.448Mi ±  5%   1.471Mi ±  5%       ~ (p=0.315 n=10)
BackendBlockQueryRange/{status=error}_|_rate()/5-14                                                         1.132Mi ±  6%   1.102Mi ±  5%       ~ (p=0.481 n=10)
BackendBlockQueryRange/{}_|_quantile_over_time(duration,_.99,_.9,_.5)/5-14                                  2.262Mi ± 42%   2.329Mi ± 47%       ~ (p=0.853 n=10)
BackendBlockQueryRange/{}_|_quantile_over_time(duration,_.99)_by_(span.http.status_code)/5-14               24.98Mi ±  4%   25.77Mi ±  5%       ~ (p=0.247 n=10)
BackendBlockQueryRange/{}_|_histogram_over_time(duration)/5-14                                              2.241Mi ± 35%   2.074Mi ± 36%       ~ (p=0.393 n=10)
BackendBlockQueryRange/{}_|_avg_over_time(duration)_by_(span.http.status_code)/5-14                         24.80Mi ±  3%   24.71Mi ±  4%       ~ (p=0.912 n=10)
BackendBlockQueryRange/{}_|_max_over_time(duration)_by_(span.http.status_code)/5-14                         24.25Mi ±  4%   25.28Mi ±  4%  +4.26% (p=0.019 n=10)
BackendBlockQueryRange/{}_|_min_over_time(duration)_by_(span.http.status_code)/5-14                         24.97Mi ±  2%   24.74Mi ±  5%       ~ (p=0.123 n=10)
BackendBlockQueryRange/{_name_!=_nil_}_|_compare({status=error})/5-14                                       280.1Mi ± 11%   279.6Mi ± 24%       ~ (p=0.971 n=10)
BackendBlockQueryRange/{}_>_{}_|_rate()_by_(name)/5-14                                                      65.08Mi ±  2%   63.71Mi ±  2%  -2.11% (p=0.015 n=10)
geomean                                                                                                     8.612Mi         8.699Mi        +1.01%

                                                                                                          │  before.txt  │              after.txt              │
                                                                                                          │  allocs/op   │  allocs/op    vs base               │
BackendBlockQueryRange/{}_|_rate()/5-14                                                                     12.90k ±  0%   12.91k ±  0%       ~ (p=0.159 n=10)
BackendBlockQueryRange/{}_|_rate()_by_(span.http.status_code)/5-14                                          39.73k ±  6%   39.07k ±  6%       ~ (p=0.579 n=10)
BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/5-14                                          13.09k ±  0%   13.08k ±  0%       ~ (p=0.305 n=10)
BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/5-14                                                  12.77k ±  0%   12.76k ±  0%  -0.04% (p=0.002 n=10)
BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/5-14                                28.37k ±  0%   28.37k ±  0%       ~ (p=0.335 n=10)
BackendBlockQueryRange/{span.http.host_!=_``_&&_span.http.flavor=`2`}_|_rate()_by_(span.http.flavor)/5-14   13.05k ±  0%   13.05k ±  0%       ~ (p=0.335 n=10)
BackendBlockQueryRange/{status=error}_|_rate()/5-14                                                         13.84k ±  0%   13.83k ±  0%       ~ (p=0.330 n=10)
BackendBlockQueryRange/{}_|_quantile_over_time(duration,_.99,_.9,_.5)/5-14                                  13.30k ±  0%   13.29k ±  0%       ~ (p=0.446 n=10)
BackendBlockQueryRange/{}_|_quantile_over_time(duration,_.99)_by_(span.http.status_code)/5-14               39.66k ± 10%   39.32k ± 11%       ~ (p=0.529 n=10)
BackendBlockQueryRange/{}_|_histogram_over_time(duration)/5-14                                              13.30k ±  0%   13.28k ±  0%  -0.11% (p=0.009 n=10)
BackendBlockQueryRange/{}_|_avg_over_time(duration)_by_(span.http.status_code)/5-14                         38.70k ±  9%   38.78k ±  1%       ~ (p=0.739 n=10)
BackendBlockQueryRange/{}_|_max_over_time(duration)_by_(span.http.status_code)/5-14                         39.37k ±  7%   39.10k ± 17%       ~ (p=0.955 n=10)
BackendBlockQueryRange/{}_|_min_over_time(duration)_by_(span.http.status_code)/5-14                         39.07k ±  8%   39.02k ±  2%       ~ (p=0.670 n=10)
BackendBlockQueryRange/{_name_!=_nil_}_|_compare({status=error})/5-14                                       1.705M ±  3%   1.705M ±  7%       ~ (p=0.190 n=10)
BackendBlockQueryRange/{}_>_{}_|_rate()_by_(name)/5-14                                                      90.56k ±  5%   89.69k ±  5%       ~ (p=0.165 n=10)
geomean                                                                                                     31.39k         31.30k        -0.28%

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Copy Markdown
Collaborator

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find. love the 11% on metrics queries.

@mdisibio mdisibio merged commit 7e15058 into grafana:main Jun 11, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants