Rewrite conditions on same attribute to equivalent array operation (2)#6353
Rewrite conditions on same attribute to equivalent array operation (2)#6353stoewer merged 30 commits intografana:mainfrom
Conversation
Example ["aa", "bb"] -> [`aa`, `bb`]
The rewriter handles the following cases:
- { .a = "a" || .a = "b" } => { .a IN ["a", "b"] }
- { .a != "a" && .a != "b" } => { .a NOT IN ["a", "b"] }
- { .a =~ "a" || .a =~ "b" } => { .a MATCH ANY ["a", "b"] }
- { .a !~ "a" && .a !~ "b" } => { .a MATCH NONE ["a", "b"] }
BinaryOperation.validate() considered everythin a valid regex where FieldExpression.EncodeToString(false) happened to return a valid regex. With this change the RHS of a regex operation is required to be a string or string array
|
@codex review |
|
To use Codex here, create a Codex account and connect to github. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dabb6270fd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| case traceql.OpNotRegex: | ||
| return parquetquery.NewRegexNotInPredicate(strs) |
There was a problem hiding this comment.
Enforce match-none semantics for regex arrays
When string-array operands hit OpNotRegex, this funnels into NewRegexNotInPredicate(strs). That predicate uses regexp.NewRegexp(..., shouldMatch=false) where MatchString returns true if any regex does not match (see pkg/regexp/regexp.go). For multiple regexes this is “not all match,” not “match none.” With the new AST rewrite, queries like { span.foo !~ "foo.*" && span.foo !~ "bar.*" } become array form and this predicate will let through values that match one of the forbidden regexes. In autocomplete/tag-values (no second-pass evaluation), this yields incorrect results. Consider implementing a true “match none” predicate (require all regexes to fail) or keep separate per-regex predicates for the array case.
Useful? React with 👍 / 👎.
|
I ran some benchmarks:
The first 4 sub-benchmarks are just simple queries where nothing can be optimized Results: |
|
There are two things in the benchmark that I still need to look into:
|
This fixes additional span/op when optimizations are enabled
|
Here are the benchmark results after optimization and fixing the span/op bug. There are again 3 different benchmark runs with the following parameters:
The first 4 sub-benchmarks are just simple queries where nothing can be optimized: |
|
Commit bc3b95b fixes a bug that was introduced with the virtual row numbers PR #5943. This is what caused the additional |
Co-authored-by: Martin Disibio <mdisibio@gmail.com>
mdisibio
left a comment
There was a problem hiding this comment.
Lgtm after latest changes, thanks. Discussed offline and we want to dig into the virtual row number fix a little more before merging.
This helps avoid additional spans/op in queries with only resource attributes
|
Latest benchmark results with the new fix for the additional spans/op: |
This helps avoid additional spans/op in queries with only resource attributes This change part of grafana#6353
This helps avoid additional spans/op in queries with only resource attributes This fix was part of grafana#6353
* Read to first virtual row after seeking on a low definition level This helps avoid additional spans/op in queries with only resource attributes This fix was part of #6353 * CHANGELOG.md
What this PR does:
Rewrite queries like
{ .a="a" || .a="b" }into the equivalent, faster form{ .a IN ["a", "b"] }. This AST optimization is applied by default, but it can be disabled using the query hintskip_optimization=true.I've added the new operators IN, NOT IN, MATCH ANY, MATCH NONE to the AST. However, they are not actually parseable in TraceQL.
Note: This PR was created from the changes in #5827. Since reviewers requested larger changes and rebasing caused lots of conflicts, I decided to create a new PR instead of updating the new one
Limitations
{ .a="a" && .b="b" || .a="c" }{ .a IN ["a", "b"] }only works internally, the syntax is not actually available in TraceQLBreaking Changes
!=and `!~' slightly:!=now means NOT IN (previously: CONTAINS NOT EQUAL)!~now means MATCH NONE (previously: CONTAINS NON-MATCH)Which issue(s) this PR fixes:
Fixes grafana/tempo-squad#874
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]