[Feature] Support Math in TraceQL Metrics#6866
[Feature] Support Math in TraceQL Metrics#6866ruslan-mikhailov wants to merge 21 commits intografana:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds support for arithmetic (math) expressions in TraceQL metrics queries by extending the parser/AST and updating the metrics execution pipeline to fan out sub-queries and then combine their time series results.
Changes:
- Extend TraceQL grammar/AST to parse metrics math expressions (e.g.
({} | rate()) / ({} | count_over_time())) and apply optional second-stage pipelines to parenthesized sub-expressions. - Update metrics evaluators to execute multiple sub-queries, tag shard results with
__query_fragment, and evaluate a math-expression tree during the final aggregation stage. - Expand unit/integration tests and benchmarks to cover math expressions and the new aggregation behavior.
Reviewed changes
Copilot reviewed 37 out of 37 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tempodb/tempodb_metrics_test.go | Updates expected labels/results to include __query_fragment and adds many math-specific query range test cases. |
| tempodb/encoding/vparquet5/block_traceql_test.go | Updates compile/fetch request usage and adds math queries to benchmarks and sampling-error tests. |
| tempodb/encoding/vparquet5/block_traceql_fetch_test.go | Updates traceql.Compile call signature usage. |
| tempodb/encoding/vparquet4/block_traceql_test.go | Updates traceql.Compile call signature usage and adds math queries to benchmarks/tests. |
| pkg/traceql/test_examples.yaml | Adds valid examples for wrapped metrics pipelines and math + second-stage combinations. |
| pkg/traceql/storage.go | Makes ExtractFetchSpansRequest reject math expressions (requires single sub-query). |
| pkg/traceql/parse.go | Makes ParseIdentifier reject math expressions via SinglePipeline(). |
| pkg/traceql/parse_test.go | Refactors second-stage parsing tests and adds parser coverage for metrics math expressions. |
| pkg/traceql/lenient_parser.go | Rebuilds pipeline keys after lenient AST mutation to keep map keys/expression leaves consistent. |
| pkg/traceql/lenient_extract.go | Switches extraction to the new multi-request condition extraction (currently selects one sub-query). |
| pkg/traceql/expr.y | Extends grammar to support metricsExpression and refactors second-stage representation. |
| pkg/traceql/enum_operators.go | Adds Operator.isArithmetic() helper. |
| pkg/traceql/engine.go | Changes Compile signature and introduces CompileFetchSpanRequests for math/multi-subquery extraction. |
| pkg/traceql/engine_test.go | Updates engine tests to validate examples via CompileFetchSpanRequests and handle SinglePipeline(). |
| pkg/traceql/engine_metrics.go | Implements multi-subquery metrics evaluation, __query_fragment tagging, and math evaluation in final stage. |
| pkg/traceql/engine_metrics_test.go | Updates evaluator tests for new interfaces/types and adds a benchmark for frontend evaluator behavior. |
| pkg/traceql/engine_metrics_compare.go | Makes compare aggregators preserve/query-fragment tags when combining shard results. |
| pkg/traceql/engine_metrics_average.go | Aligns average aggregator with new span/series processor interfaces. |
| pkg/traceql/ast.go | Restructures RootExpr to hold per-subquery maps and a math-expression tree. |
| pkg/traceql/ast_validate.go | Updates validation for map-based pipelines and disallows compare() with second-stage or math. |
| pkg/traceql/ast_test.go | Updates tests to use SinglePipeline() where applicable. |
| pkg/traceql/ast_stringer.go | Switches RootExpr.String() to render via the math-expression structure. |
| pkg/traceql/ast_rewriter.go | Rewrites pipeline keys across maps and expression leaves after AST rewrites. |
| pkg/traceql/ast_metrics.go | Introduces spanProcessor/seriesProcessor split and a batchSeriesProcessor router for shard results. |
| pkg/traceql/ast_metrics_test.go | Updates benchmark construction for new metrics filter signatures. |
| pkg/traceql/ast_metrics_math.go | Adds math-expression evaluation for combining sub-query SeriesSets. |
| pkg/traceql/ast_execute_test.go | Updates evaluation tests to use SinglePipeline() access. |
| pkg/traceql/ast_conditions.go | Changes condition extraction to return per-subquery requests keyed by query fragment. |
| pkg/traceql/ast_conditions_test.go | Updates condition extraction tests for the new return type and SinglePipeline() structure. |
| modules/livestore/instance_search.go | Updates QueryRange execution to use the new MetricsEvaluator interface type. |
| modules/frontend/pipeline/async_weight_middleware.go | Computes weight across multiple FetchSpansRequests for math expressions; fixes a spelling typo. |
| modules/frontend/pipeline/async_weight_middleware_test.go | Adds test cases asserting weight behavior for math expressions. |
| modules/frontend/metrics_query_range_sharder.go | Uses CompileFetchSpanRequests for sharder parsing. |
| modules/frontend/metrics_query_range_handler_test.go | Adds handler tests ensuring math queries succeed and exemplar normalization works with fragments. |
| modules/frontend/mcp_tools.go | Updates MCP tool routing logic to detect metrics queries via new fields. |
| integration/api/query_range_test.go | Adds integration coverage for math metrics queries (including exemplar expectations). |
| var ( | ||
| _ firstStageElement = (*averageOverTimeAggregator)(nil) | ||
| _ spanProcessor = (*averageOverTimeAggregator)(nil) | ||
| _ seriesProcessor = (*averageOverTimeAggregator)(nil) | ||
| ) |
There was a problem hiding this comment.
I would also split such structs into spanProcessor and seriesProcessor, but left as is to not overblow the PR.
| if smk[i].Name == internalLabelQueryFragment { | ||
| continue | ||
| } |
There was a problem hiding this comment.
Ideally, we can pass internal labels in proto inside InternalLabels map[string]Label inside of TimeSeries, and just drop it in L3 (frontend). This would simplify code here and for other features like histogram or avg_over_time.
It will be done in a separate PR
7425d61 to
75eaa54
Compare
75eaa54 to
d77e409
Compare
d77e409 to
543ca43
Compare
543ca43 to
11605a2
Compare
11605a2 to
b1944fd
Compare
c31cf4b to
6abc94f
Compare
6abc94f to
c416ca8
Compare
| for _, t := range target { | ||
| valuesLen += len(t.Values) | ||
| } | ||
| buf := make([]float64, valuesLen) |
There was a problem hiding this comment.
Maybe that's overkill? But benchmark showed nice boost from this optimisation. It allocates one big array instead of allocating multiple smaller arrays.
| if math.IsNaN(lhs) { | ||
| lhs = 0 | ||
| } | ||
| if math.IsNaN(rhs) { | ||
| rhs = 0 | ||
| } |
There was a problem hiding this comment.
This violates IEEE-754. I think, we should comply with it. In that case when a query has no spans within bucket's time range, all operations for this bucket will also result in empty bucket (NaN)
wdyt?
| func (c ChainedSecondStage) separator() string { | ||
| if len(c) == 0 { | ||
| return "" | ||
| } | ||
| return c[0].separator() | ||
| } | ||
|
|
There was a problem hiding this comment.
I don't quite remember, why I returned separator. I think that was to make String for expression. But I think after refactoring it's not necessary again.
I need to try to remove it
a63c235 to
f4af240
Compare
| // distribute exemplars for each sub-query | ||
| exemplarsCapacity := int(req.Exemplars) | ||
| if req.Exemplars > maxExemplars { | ||
| level.Warn(log.Logger).Log("msg", "capping exemplars to safety limit", "requested", req.Exemplars, "cap", maxExemplars) | ||
| exemplarsCapacity = int(maxExemplars) | ||
| } | ||
| exemplars := exemplarsCapacity / len(expr.Pipeline) | ||
| if exemplarsCapacity > 0 && exemplars == 0 { | ||
| exemplars = 1 // at least one per sub-query when exemplars are requested | ||
| } | ||
|
|
||
| if metricsPipeline == nil { | ||
| return nil, fmt.Errorf("not a metrics query") | ||
| bme := make(batchMetricsEvaluator, len(expr.Pipeline)) | ||
| for key, pipeline := range expr.Pipeline { | ||
| sp := expr.BatchSpanProcessor[key] | ||
| if sp == nil { | ||
| return nil, fmt.Errorf("not a metrics query") | ||
| } | ||
| // This initializes all step buffers, counters, etc | ||
| sp.init(req, AggregateModeRaw) | ||
|
|
||
| storageReq := storageReqs[key] | ||
| e.applySampleHints(expr, pipeline, &storageReq, cfg.allowUnsafeQueryHints) | ||
|
|
||
| exemplars := exemplars | ||
| if exemplarsCapacity < 0 { | ||
| exemplars = 0 | ||
| } else { | ||
| exemplarsCapacity -= exemplars | ||
| } |
There was a problem hiding this comment.
The exemplar distribution logic can exceed the requested/capped exemplar budget when req.Exemplars is smaller than the number of sub-queries. Example: Exemplars=2 with 3 sub-queries sets exemplars=1 and the loop still assigns 1 exemplar when exemplarsCapacity==0, causing total allocated exemplars > budget. Could we clamp per-sub-query exemplars to the remaining capacity (treat remaining==0 as 0), and consider making the allocation deterministic (iteration over expr.Pipeline is a map, so remainder assignment is currently order-dependent)?
| var i int | ||
| for _, l := range ls { | ||
| if l.Name == labels.MetricName { | ||
| continue | ||
| } | ||
| key[i] = SeriesMapLabel{Name: l.Name, Value: l.Value.MapKey()} | ||
| i++ | ||
| } |
There was a problem hiding this comment.
Labels.MapKey() now skips __name__, but it still includes internal labels like __query_fragment. For math queries, batchMetricsEvaluator.Results() appends __query_fragment and then calls MapKey(). If a sub-query already uses the maximum supported group-by count (e.g. count_over_time() by (...) with 5 labels), adding __query_fragment makes the label count exceed maxGroupBys and MapKey() will panic due to out-of-bounds indexing. Could we either (a) reserve a group-by slot for __query_fragment when validating math expressions, or (b) change the keying strategy so __query_fragment doesn’t consume a SeriesMapKey slot (while still preventing collisions across fragments)?
| var i int | |
| for _, l := range ls { | |
| if l.Name == labels.MetricName { | |
| continue | |
| } | |
| key[i] = SeriesMapLabel{Name: l.Name, Value: l.Value.MapKey()} | |
| i++ | |
| } | |
| var ( | |
| i int | |
| queryFragment string | |
| ) | |
| for _, l := range ls { | |
| switch l.Name { | |
| case labels.MetricName: | |
| continue | |
| case internalLabelQueryFragment: | |
| // Keep query fragments distinct without consuming a SeriesMapKey slot. | |
| queryFragment = l.Value.EncodeToString(false) | |
| continue | |
| } | |
| key[i] = SeriesMapLabel{Name: l.Name, Value: l.Value.MapKey()} | |
| i++ | |
| } | |
| if queryFragment != "" { | |
| fragmentMarker := fmt.Sprintf("\x00%s=%s", internalLabelQueryFragment, queryFragment) | |
| if i == 0 { | |
| key[0] = SeriesMapLabel{Name: fragmentMarker} | |
| } else { | |
| key[0].Name += fragmentMarker | |
| } | |
| } |
| # math expressions with second stage pipeline | ||
| - '({} | rate()) + ({} | rate()) | topk(5)' | ||
| - '({} | rate()) + ({} | rate()) > 10' | ||
| - '({} | rate()) + ({} | rate()) | topk(5) > 10' |
There was a problem hiding this comment.
I need to add test cases for all operations + invalid like {} | rate() + {} | rate() and unsupported like %
| // Use the first sub-query request. | ||
| var req FetchSpansRequest | ||
| for _, v := range requests { | ||
| req = v | ||
| break | ||
| } |
There was a problem hiding this comment.
I did not support Math for Tag Values search and in this PR did it only for the first sub-query.
I'm thinking if Grafana could send us not the whole query, but only sub-query where the cursor is. For example,
({ .foo = "bar" } | rate()) + ({ .b = "val" && .v=| } | rate())
^ cursor is here
it could send { .b = "val" && .v= } | rate(). That would make more sense.
Anyways, support will be added in a separate PR
There was a problem hiding this comment.
Did it here: grafana/grafana@main...ruslan-mikhailov:grafana:feature/parse-tempo-math-expr
With
({resource.host.name=} | rate()) + ({ resource.k6.1Fn7HcicetM8qkG= } | rate())
^ ^
cursor position 1 cursor position 2
for cursor position 1, it sends {resource.host.name=} | rate() into request, and { resource.k6.1Fn7HcicetM8qkG= } | rate() for position 2.
I will open a PR after this PR with math syntax proposal gets merged
What this PR does:
It adds support for binary arithmetic operations (
+,-,*,/) between TraceQL metrics queries.This enables queries like error rate ratio:
I force using parentheses to have clear boundaries between sub-queries, to not have misunderstandings in queries
This PR does not support operations with number like
, but lays a foundation to add it in a separate PR. UPD branch for scalars is created and waits for current PR's approval.
Examples
Error rate
Share of resource.service.name
A query with filtration
Error rate by service
Design
Idea
When received a Math query, Tempo passes it through read path as-is without changes. On querier/live-store it then is divided into sub-queries to fetch and process independently. For example,
({status=error} | rate()) / ({} | rate())becomes({status=error} | rate())and({} | rate()). Processing goes independently through L1 to L3. On level 3 processing step, it evaluates math expression using all the query fragments. We do not form independent jobs for each sub-query, because keeping them together allows us to better utilise open descriptors, OS caches etc.Alternative approaches we considered is a single-pass execution. With this approach, we would form a storage request for all the sub-queries and then we could route resulted spans into sub-query pipelines. However, on benchmarks, the first approach showed better results.
To help you to understand data process flow, I recommend starting review with test cases in
pkg/traceql/engine_metrics_test.go, they explain well how data are processed and transformed.Implementation
firstStageElementis an interface that combines all the operations in L1, L2 and L3 for metrics. I separated it into 2 logically different interfaces:spanProcessorfor processing spans (L1) andseriesProcessorfor series (L2 and L3). WhileBatchSpanProcessorstill needs to be a map, because L1 needs to know how to route a span, forseriesProcessorit can determine under the hood based on __query_fragment label we provide in L1.map[string]seriesProcessorthen becomesseriesProcessorand L2/L3 processing in engine_metrics.go not affected.As Math execution can be done only after L1-3 processing, it lays well to secondStage. I introduced
mathExpressionthat reflects structure of the math expression and satisfies secondStage interface.Rollout problem
If we keep __query_fragment logic for all queries, then during rollout when only part of tempo has new version, it can bring inconsistent results. If all but 1 querier updated, final result would drop results from that querier as it won't have __query_fragment; if frontend is not updated, then it won't drop __query_fragment label, and it will end up in response.
To solve this problem, I changed the logic: when only one sub-query, it won't add __query_fragment. When processed in L2/3 and math, it will takes first and only result without looking at the label. This will also work for queries like
({} | rate()) + ({} | rate()).Commit with this change: adfde7a#diff-0a82119641a37adbdfe820b3dd67e18f6c1a6af8a5f72e88022206ae4449409a
Testing
The main test that was used to verify the feature is
pkg/traceql/engine_metrics_test.goAdded simple integration tests for e2e smoke regression testing.
Additionally, manually tested that results make sense.
Semi-automated testing
To verify e2e behaviour, I wrote a python script targeting
example/docker-compose/single-binary. It validates that server-side arithmetic between metric queries returns correct results for both range and instant queries.Approach: Each test fetches the combined math expression from the API, then independently fetches each sub-query and computes the expected result client-side. Results are compared with 5% tolerance.
What's tested
> 5s) and inline per-operand filters (e.g.> 10ms) applied to math results(sum+count)/rateand(rate+count)*(max-min)Which issue(s) this PR fixes
#6417
#5756
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]