Skip to content

Vulture + Blob Fixes#6815

Merged
mdisibio merged 8 commits intografana:mainfrom
mdisibio:vulture-blob-fixes
Mar 30, 2026
Merged

Vulture + Blob Fixes#6815
mdisibio merged 8 commits intografana:mainfrom
mdisibio:vulture-blob-fixes

Conversation

@mdisibio
Copy link
Copy Markdown
Contributor

@mdisibio mdisibio commented Mar 27, 2026

What this PR does:
Following up on #6731 , which uncovered some additional bugs in vulture and blobs.

(1) This is the easy one - needed to update the trace sorting to deeply sort events and links, so vulture can consistently compare trace contents. The order of attributes isn't preserved when there are dedicated columns enabled (which is what we want).

(2) This is the hard one - it was still highlighting some errors searching on the blob attributes. The root cause was that all of our string predicates weren't handling the case of min/max parquet metadata not present. This is tricky because the parquet thrift column index itself doesn't distinguish between no value and empty string (both are represented as zero-length). Updated the predicates for this check, but then it exposed another bug a few lines after. All instances of KeepValue were missing null checks, and counting parquet.Value(null) as equal to the empty string []byte{}. Without this the results of queries like {span.doesNotExist != "foo"} would match all spans.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Follow-up bugfix work for vulture + blob/dedicated-attribute querying, focusing on making trace comparisons deterministic and fixing predicate behavior when Parquet column index min/max metadata is missing and/or values are null.

Changes:

  • Deep-sort traces (including events + links) and expand attribute sorting so vulture can reliably compare trace contents.
  • Fix Parquet query predicates to handle missing min/max metadata and null values correctly (plus regenerate predicates.gen.go).
  • Update vparquet v3/v4/v5 tests and dedicated-column test config to reflect dedicated.span.5 being treated as a blob.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tempodb/encoding/vparquet5/block_traceql_test.go Adds TraceQL match/non-match cases for blob dedicated attribute queries
tempodb/encoding/vparquet4/schema_test.go Adjusts schema test to move dedicated.span.5 to generic attrs (blob fallback)
tempodb/encoding/vparquet4/block_search_test.go Updates generated trace data to include blob attribute in generic attrs
tempodb/encoding/vparquet3/schema_test.go Same adjustment as vparquet4 for blob attribute coverage
tempodb/encoding/vparquet3/block_search_test.go Same adjustment as vparquet4 for blob attribute coverage
pkg/util/test/req.go Marks dedicated.span.5 as a blob dedicated column option
pkg/parquetquerygen/predicates.go Updates predicate generator template for null/minmax handling
pkg/parquetquery/predicates.gen.go Regenerated predicates with new null/minmax behavior
pkg/model/trace/sort_test.go Expands sorting test to include attributes, events, and links ordering
pkg/model/trace/sort.go Implements deeper sorting (events/links) + deeper attribute sorting
Makefile Adds docker-tempo-vulture-multi target

Comment thread Makefile
Comment thread pkg/model/trace/sort.go
Comment on lines 13 to +24
func SortTrace(t *tempopb.Trace) {
// Sort bottom up by span start times
for _, b := range t.ResourceSpans {
for _, ss := range b.ScopeSpans {
for _, span := range ss.Spans {
sort.Slice(span.Events, func(i, j int) bool {
return compareEvents(span.Events[i], span.Events[j])
})
sort.Slice(span.Links, func(i, j int) bool {
return compareLinks(span.Links[i], span.Links[j])
})
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SortTrace is called from the trace combiner (pkg/model/trace/combine.go) and can run on large traces; sorting events and links for every span is extra work compared to before. Since this is likely on a hot path, could we run/attach a benchmark (or at least a before/after CPU profile) to confirm there’s no meaningful regression?

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good call out, and it is indeed is about 3x slower and allocates 3x as much. However I don't think it will be a bottleneck at runtime, this sorting should be insignificant compared to the work to create and flush the parquet file. Therefore let's leave it, and we will revisit it and add options to skip portions of sorting if needed. I think since we are sorting a few more things, it might help achieve a small benefit in file size.

Here are the benchmarks from main:

goos: darwin
goarch: arm64
pkg: github.com/grafana/tempo/pkg/model/trace
cpu: Apple M4 Pro
                          │ before.txt  │              after.txt               │
                          │   sec/op    │    sec/op     vs base                │
SortTraceAndAttributes-14   220.3n ± 1%   739.4n ± 38%  +235.69% (p=0.002 n=6)

                          │ before.txt │              after.txt              │
                          │    B/op    │    B/op      vs base                │
SortTraceAndAttributes-14   320.0 ± 0%   1104.0 ± 0%  +245.00% (p=0.002 n=6)

                          │ before.txt │             after.txt              │
                          │ allocs/op  │ allocs/op   vs base                │
SortTraceAndAttributes-14   12.00 ± 0%   40.00 ± 0%  +233.33% (p=0.002 n=6)

Copy link
Copy Markdown
Contributor Author

@mdisibio mdisibio Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked profiles from one of our larger internal clusters and SortTrace is indeed not significant:

Block-builders:

  • not present in cpu profiles at all
  • not present in mem profiles at all

Live-stores:

  • 17.5s out of 19.9h of cpu time == 0.02% of cpu
  • 4GB out of 13.8 TB == 0.03% of allocs

Backend-workers:

  • 260ms of 3.22h of cpu time == 0.002% of cpu
  • 39MB of 1.36 TB == 0.003% of allocs

Comment thread pkg/model/trace/sort.go Outdated
Comment thread pkg/model/trace/sort.go
Comment on lines +17 to +23
for _, span := range ss.Spans {
sort.Slice(span.Events, func(i, j int) bool {
return compareEvents(span.Events[i], span.Events[j])
})
sort.Slice(span.Links, func(i, j int) bool {
return compareLinks(span.Links[i], span.Links[j])
})
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort.Slice is not stable, so if two events (or links) compare equal under compareEvents/compareLinks, their relative order can still change between runs. Since this is meant to make vulture comparisons deterministic, would it be safer to use sort.SliceStable here (or extend the comparator with an additional tie-breaker) to avoid nondeterministic ordering for equal keys?

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it's not important if equal elements are swapped and we will stick with sort.Slice because it's faster: (1) at run time on real data, this sorting is best-effort to help with compression effectiveness and does not have to be perfect. (2) for vulture testing, sort.Slice is good enough for the test data it uses.

Comment thread pkg/model/trace/sort.go
Comment thread pkg/parquetquerygen/predicates.go Outdated
Comment thread tempodb/encoding/vparquet5/block_traceql_test.go Outdated
@mdisibio mdisibio marked this pull request as ready for review March 30, 2026 18:32
@mdisibio mdisibio changed the title [WIP] Vulture + Blob Fixes Vulture + Blob Fixes Mar 30, 2026
@mdisibio mdisibio merged commit 3db81a3 into grafana:main Mar 30, 2026
44 of 45 checks passed
mdisibio added a commit to mdisibio/tempo that referenced this pull request Mar 30, 2026
* Fix trace sorting to deeply sort events and links.  Fix discovered bugs of blob handling in predicates

* lint

* Repair vp3/4 for new blob dedicated column testing

* Review feedback

* Add benchmark, extend test for nil resource handling

* Format generated predicate output

* changelog
mdisibio added a commit that referenced this pull request Mar 30, 2026
* Fix trace sorting to deeply sort events and links.  Fix discovered bugs of blob handling in predicates

* lint

* Repair vp3/4 for new blob dedicated column testing

* Review feedback

* Add benchmark, extend test for nil resource handling

* Format generated predicate output

* changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants