Vulture + Blob Fixes#6815
Conversation
…s of blob handling in predicates
There was a problem hiding this comment.
Pull request overview
Follow-up bugfix work for vulture + blob/dedicated-attribute querying, focusing on making trace comparisons deterministic and fixing predicate behavior when Parquet column index min/max metadata is missing and/or values are null.
Changes:
- Deep-sort traces (including events + links) and expand attribute sorting so vulture can reliably compare trace contents.
- Fix Parquet query predicates to handle missing min/max metadata and null values correctly (plus regenerate
predicates.gen.go). - Update vparquet v3/v4/v5 tests and dedicated-column test config to reflect
dedicated.span.5being treated as a blob.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tempodb/encoding/vparquet5/block_traceql_test.go | Adds TraceQL match/non-match cases for blob dedicated attribute queries |
| tempodb/encoding/vparquet4/schema_test.go | Adjusts schema test to move dedicated.span.5 to generic attrs (blob fallback) |
| tempodb/encoding/vparquet4/block_search_test.go | Updates generated trace data to include blob attribute in generic attrs |
| tempodb/encoding/vparquet3/schema_test.go | Same adjustment as vparquet4 for blob attribute coverage |
| tempodb/encoding/vparquet3/block_search_test.go | Same adjustment as vparquet4 for blob attribute coverage |
| pkg/util/test/req.go | Marks dedicated.span.5 as a blob dedicated column option |
| pkg/parquetquerygen/predicates.go | Updates predicate generator template for null/minmax handling |
| pkg/parquetquery/predicates.gen.go | Regenerated predicates with new null/minmax behavior |
| pkg/model/trace/sort_test.go | Expands sorting test to include attributes, events, and links ordering |
| pkg/model/trace/sort.go | Implements deeper sorting (events/links) + deeper attribute sorting |
| Makefile | Adds docker-tempo-vulture-multi target |
| func SortTrace(t *tempopb.Trace) { | ||
| // Sort bottom up by span start times | ||
| for _, b := range t.ResourceSpans { | ||
| for _, ss := range b.ScopeSpans { | ||
| for _, span := range ss.Spans { | ||
| sort.Slice(span.Events, func(i, j int) bool { | ||
| return compareEvents(span.Events[i], span.Events[j]) | ||
| }) | ||
| sort.Slice(span.Links, func(i, j int) bool { | ||
| return compareLinks(span.Links[i], span.Links[j]) | ||
| }) | ||
| } |
There was a problem hiding this comment.
SortTrace is called from the trace combiner (pkg/model/trace/combine.go) and can run on large traces; sorting events and links for every span is extra work compared to before. Since this is likely on a hot path, could we run/attach a benchmark (or at least a before/after CPU profile) to confirm there’s no meaningful regression?
There was a problem hiding this comment.
This is a good call out, and it is indeed is about 3x slower and allocates 3x as much. However I don't think it will be a bottleneck at runtime, this sorting should be insignificant compared to the work to create and flush the parquet file. Therefore let's leave it, and we will revisit it and add options to skip portions of sorting if needed. I think since we are sorting a few more things, it might help achieve a small benefit in file size.
Here are the benchmarks from main:
goos: darwin
goarch: arm64
pkg: github.com/grafana/tempo/pkg/model/trace
cpu: Apple M4 Pro
│ before.txt │ after.txt │
│ sec/op │ sec/op vs base │
SortTraceAndAttributes-14 220.3n ± 1% 739.4n ± 38% +235.69% (p=0.002 n=6)
│ before.txt │ after.txt │
│ B/op │ B/op vs base │
SortTraceAndAttributes-14 320.0 ± 0% 1104.0 ± 0% +245.00% (p=0.002 n=6)
│ before.txt │ after.txt │
│ allocs/op │ allocs/op vs base │
SortTraceAndAttributes-14 12.00 ± 0% 40.00 ± 0% +233.33% (p=0.002 n=6)
There was a problem hiding this comment.
I just checked profiles from one of our larger internal clusters and SortTrace is indeed not significant:
Block-builders:
- not present in cpu profiles at all
- not present in mem profiles at all
Live-stores:
- 17.5s out of 19.9h of cpu time == 0.02% of cpu
- 4GB out of 13.8 TB == 0.03% of allocs
Backend-workers:
- 260ms of 3.22h of cpu time == 0.002% of cpu
- 39MB of 1.36 TB == 0.003% of allocs
| for _, span := range ss.Spans { | ||
| sort.Slice(span.Events, func(i, j int) bool { | ||
| return compareEvents(span.Events[i], span.Events[j]) | ||
| }) | ||
| sort.Slice(span.Links, func(i, j int) bool { | ||
| return compareLinks(span.Links[i], span.Links[j]) | ||
| }) |
There was a problem hiding this comment.
sort.Slice is not stable, so if two events (or links) compare equal under compareEvents/compareLinks, their relative order can still change between runs. Since this is meant to make vulture comparisons deterministic, would it be safer to use sort.SliceStable here (or extend the comparator with an additional tie-breaker) to avoid nondeterministic ordering for equal keys?
There was a problem hiding this comment.
In this case it's not important if equal elements are swapped and we will stick with sort.Slice because it's faster: (1) at run time on real data, this sorting is best-effort to help with compression effectiveness and does not have to be perfect. (2) for vulture testing, sort.Slice is good enough for the test data it uses.
* Fix trace sorting to deeply sort events and links. Fix discovered bugs of blob handling in predicates * lint * Repair vp3/4 for new blob dedicated column testing * Review feedback * Add benchmark, extend test for nil resource handling * Format generated predicate output * changelog
* Fix trace sorting to deeply sort events and links. Fix discovered bugs of blob handling in predicates * lint * Repair vp3/4 for new blob dedicated column testing * Review feedback * Add benchmark, extend test for nil resource handling * Format generated predicate output * changelog
What this PR does:
Following up on #6731 , which uncovered some additional bugs in vulture and blobs.
(1) This is the easy one - needed to update the trace sorting to deeply sort events and links, so vulture can consistently compare trace contents. The order of attributes isn't preserved when there are dedicated columns enabled (which is what we want).
(2) This is the hard one - it was still highlighting some errors searching on the blob attributes. The root cause was that all of our string predicates weren't handling the case of min/max parquet metadata not present. This is tricky because the parquet thrift column index itself doesn't distinguish between no value and empty string (both are represented as zero-length). Updated the predicates for this check, but then it exposed another bug a few lines after. All instances of
KeepValuewere missing null checks, and countingparquet.Value(null)as equal to the empty string[]byte{}. Without this the results of queries like{span.doesNotExist != "foo"}would match all spans.Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]