[vParquet5] Fix buffer reuse bug with event-level dedicated columns#6914
[vParquet5] Fix buffer reuse bug with event-level dedicated columns#6914mdisibio merged 6 commits intografana:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Extends Tempodb/vParquet round-trip and query tests to better reproduce and prevent a buffer reuse bug affecting event-level dedicated columns (especially for blob/int dedicated attributes).
Changes:
- Add dedicated column assignments to tempodb round-trip tests and populate traces with random dedicated attributes (including event-level).
- Update vParquet3/4/5 tests to treat
dedicated.span.5as a blob-backed string and use a consistent fixed blob payload for assertions. - Add utilities in
pkg/util/testto generate dedicated attributes (including blob-sized payloads).
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
tempodb/tempodb_test.go |
Configures blocks with dedicated columns and enriches traces with dedicated attributes for end-to-end round-trip verification. |
tempodb/encoding/vparquet5/schema_test.go |
Updates expected dedicated attribute values to use the blob test payload. |
tempodb/encoding/vparquet5/block_traceql_test.go |
Updates TraceQL match cases for blob-backed dedicated span attribute. |
tempodb/encoding/vparquet5/block_search_test.go |
Updates search tests to use blob payload for dedicated span attribute. |
tempodb/encoding/vparquet5/block_findtracebyid_test.go |
Strengthens find-by-id test to round-trip fully populated traces and explicitly exercises buffer reuse. |
tempodb/encoding/vparquet4/schema_test.go |
Aligns vParquet4 schema tests with blob payload expectations for dedicated.span.5. |
tempodb/encoding/vparquet4/block_traceql_test.go |
Aligns vParquet4 TraceQL tests with blob payload expectations. |
tempodb/encoding/vparquet4/block_search_test.go |
Aligns vParquet4 search tests with blob payload expectations. |
tempodb/encoding/vparquet3/schema_test.go |
Aligns vParquet3 schema tests with blob payload expectations. |
tempodb/encoding/vparquet3/block_traceql_test.go |
Aligns vParquet3 TraceQL tests with blob payload expectations. |
tempodb/encoding/vparquet3/block_search_test.go |
Aligns vParquet3 search tests with blob payload expectations. |
pkg/util/trace_info.go |
Modifies vulture trace tag generation (randomizes presence of some “fixed” tags and well-known attributes). |
pkg/util/test/req.go |
Adds helpers/constants for dedicated column generation, including blob-sized payloads and random dedicated attribute population. |
| var ( | ||
| id = test.ValidTraceID(nil) | ||
| req = test.MakeTrace(rand.Int()%10, id) | ||
| ) | ||
|
|
||
| // Populate data using the same dedicated columns as configured on the block above. | ||
| test.AddRandomDedicatedAttributes(req) | ||
|
|
There was a problem hiding this comment.
These tests now rely on randomized dedicated attributes (including per-event subsets) to reproduce the buffer-reuse bug. Because TestCompleteBlock runs many subtests with t.Parallel() and uses the global math/rand, the exact attribute patterns can vary with scheduling, which can make this regression test less deterministic (e.g., in the unlikely case all events happen to set all dedicated columns). What do you think about making the pattern deterministic (fixed seed/local rand, or explicitly constructing at least one event where a dedicated column is omitted) so the bug is guaranteed to be exercised?
There was a problem hiding this comment.
Unfortunately I don't think this will work. A static seed that triggers the bug now, could easily stop triggering the bug after innocuous changes and go unnoticed. The randomness has been serving us well for several years, we'll keep it for now.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| // generateFixedAttributesWithPrefix returns the fixed attributes with a prefix. Keys are lowercase with a hyphen before the numeric suffix (e.g. string-01, int-01, blob-01). | ||
| func (t *TraceInfo) generateFixedAttributesWithPrefix(prefix string) []*jaeger.Tag { | ||
| return []*jaeger.Tag{ | ||
| {Key: fmt.Sprintf("%s-string-01", prefix), VType: jaeger.TagType_STRING, VStr: stringPtr(t.generateRandomString())}, | ||
| {Key: fmt.Sprintf("%s-string-02", prefix), VType: jaeger.TagType_STRING, VStr: stringPtr(t.generateRandomString())}, | ||
| {Key: fmt.Sprintf("%s-string-03", prefix), VType: jaeger.TagType_STRING, VStr: stringPtr(t.generateRandomString())}, | ||
| {Key: fmt.Sprintf("%s-string-04", prefix), VType: jaeger.TagType_STRING, VStr: stringPtr(t.generateRandomString())}, | ||
| {Key: fmt.Sprintf("%s-string-05", prefix), VType: jaeger.TagType_STRING, VStr: stringPtr(t.generateRandomString())}, | ||
| {Key: fmt.Sprintf("%s-blob-01", prefix), VType: jaeger.TagType_STRING, VStr: stringPtr(t.generateRandomBlob(vultureBlobSize))}, | ||
| {Key: fmt.Sprintf("%s-int-01", prefix), VType: jaeger.TagType_LONG, VLong: int64Ptr(t.generateRandomInt(1, 1000000))}, | ||
| {Key: fmt.Sprintf("%s-int-02", prefix), VType: jaeger.TagType_LONG, VLong: int64Ptr(t.generateRandomInt(1, 1000000))}, | ||
| {Key: fmt.Sprintf("%s-int-03", prefix), VType: jaeger.TagType_LONG, VLong: int64Ptr(t.generateRandomInt(1, 1000000))}, | ||
| {Key: fmt.Sprintf("%s-int-04", prefix), VType: jaeger.TagType_LONG, VLong: int64Ptr(t.generateRandomInt(1, 1000000))}, | ||
| {Key: fmt.Sprintf("%s-int-05", prefix), VType: jaeger.TagType_LONG, VLong: int64Ptr(t.generateRandomInt(1, 1000000))}, | ||
| var ( | ||
| numStrings = t.r.Intn(6) | ||
| numBlobs = t.r.Intn(2) | ||
| numInts = t.r.Intn(6) | ||
| tags = make([]*jaeger.Tag, 0, numStrings+numBlobs+numInts) | ||
| ) |
There was a problem hiding this comment.
generateFixedAttributesWithPrefix (and its comment) now returns a random count of tags with random values, so it’s no longer “fixed”. This is confusing for callers/readers and makes the docstring inaccurate. Could we either rename the helper (and/or update the comment) to reflect the new randomized behavior?
…6914) * Add failing tests at multiple levels that trigger the bug * Update tests, lint, and fix vp5 actual bug * review feedback * changelog * Update CHANGELOG.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tempodb/tempodb_test.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
What this PR does:
There is a bug with buffer reuse in event-level dedicated columns, for event attributes. The buffer was not sufficiently cleared, leading to duplicated attributes across events.
Extended the roundtrip/tempodb level tests to fully populate all options and types now (like integers and blobs), to test holistically, and not just in the encoding itself. Verified they exhibit the bug before fixing.
I updated vulture with more randomness too, but it wasn't able to trigger the bug. I think it requires a higher write volume.
Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]