Skip to content

[Bugfix] vParquetX: race conditions#6773

Merged
ruslan-mikhailov merged 5 commits intografana:mainfrom
ruslan-mikhailov:bugfix/vparquetx-race-conditions
Mar 26, 2026
Merged

[Bugfix] vParquetX: race conditions#6773
ruslan-mikhailov merged 5 commits intografana:mainfrom
ruslan-mikhailov:bugfix/vparquetx-race-conditions

Conversation

@ruslan-mikhailov
Copy link
Copy Markdown
Contributor

What this PR does: fixes race conditions found by TestWalBlockRaceConditionCheck ran with -race -count=10

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copilot AI review requested due to automatic review settings March 25, 2026 12:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses data races in the vParquet WAL block implementations (vparquet3/4/5), aiming to make concurrent reads (search/fetch/iterate) safe while writes/flushes are happening.

Changes:

  • Add mutex protection around flushedSize / unflushedSize updates and DataLength() reads.
  • Avoid iterating b.flushed directly by routing Iterator() / FindTraceByID() through readFlushes().
  • Add a new concurrent “race condition check” test for WAL blocks in vparquet3/4/5.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tempodb/encoding/vparquet5/wal_block.go Synchronizes size accounting and iteration over flushed WAL pages.
tempodb/encoding/vparquet4/wal_block.go Same synchronization changes for vparquet4.
tempodb/encoding/vparquet3/wal_block.go Same synchronization changes for vparquet3.
tempodb/encoding/vparquet5/wal_block_test.go Adds a concurrent stress test intended to reproduce races under -race.
tempodb/encoding/vparquet4/wal_block_test.go Adds the same concurrent race-check test for vparquet4.
tempodb/encoding/vparquet3/wal_block_test.go Adds the same concurrent race-check test for vparquet3.

Comment thread tempodb/encoding/vparquet3/wal_block_test.go Outdated
Comment thread tempodb/encoding/vparquet3/wal_block_test.go
Comment thread tempodb/encoding/vparquet5/wal_block_test.go Outdated
Comment thread tempodb/encoding/vparquet5/wal_block_test.go
Comment thread tempodb/encoding/vparquet5/wal_block_test.go
Comment thread tempodb/encoding/vparquet4/wal_block_test.go Outdated
Comment thread tempodb/encoding/vparquet4/wal_block_test.go
@ruslan-mikhailov ruslan-mikhailov force-pushed the bugfix/vparquetx-race-conditions branch from 87de5ed to cd681bb Compare March 25, 2026 12:45
Copilot AI review requested due to automatic review settings March 25, 2026 13:02
@ruslan-mikhailov ruslan-mikhailov force-pushed the bugfix/vparquetx-race-conditions branch from cd681bb to 5b78063 Compare March 25, 2026 13:02
@ruslan-mikhailov ruslan-mikhailov force-pushed the bugfix/vparquetx-race-conditions branch from 5b78063 to de6cafa Compare March 25, 2026 13:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Comment thread tempodb/encoding/vparquet4/wal_block_test.go
Comment thread tempodb/encoding/vparquet4/wal_block_test.go
Comment thread tempodb/encoding/vparquet3/wal_block_test.go
Comment thread tempodb/encoding/vparquet3/wal_block_test.go
Comment thread tempodb/encoding/vparquet5/wal_block_test.go
Comment thread tempodb/encoding/vparquet5/wal_block_test.go
@@ -355,7 +355,9 @@ func (b *walBlock) AppendTrace(id common.ID, trace *tempopb.Trace, start, end ui
b.meta.ObjectAdded(start, end)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to add to changelog

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+

b.meta.ObjectAdded(start, end)
b.ids.Set(id, int64(b.ids.Len())) // Next row number

b.mtx.Lock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Since this is at the end of the method, we could defer the unlock and stylistically I think that reads better.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+

Comment thread tempodb/encoding/vparquet3/wal_block.go
@ruslan-mikhailov ruslan-mikhailov force-pushed the bugfix/vparquetx-race-conditions branch from de6cafa to 1220425 Compare March 25, 2026 14:42
Copilot AI review requested due to automatic review settings March 25, 2026 14:45
@ruslan-mikhailov ruslan-mikhailov force-pushed the bugfix/vparquetx-race-conditions branch from 1220425 to b2456d6 Compare March 25, 2026 14:45
@ruslan-mikhailov
Copy link
Copy Markdown
Contributor Author

+ rebase to resolve conflicts in changelog

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Comment on lines +340 to +341
"Iterator": func() { _, _ = w.Iterator() },
"DataLength": func() { _ = w.DataLength() },
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The race-check test calls w.Iterator() repeatedly but never closes the returned iterator. walBlock.Iterator() opens parquet files (via rowIterator()), so not closing can leak file descriptors and make this test flaky (or stop exercising races once it hits OS limits). Make the reader close the iterator each time (even if you ignore the error).

Suggested change
"Iterator": func() { _, _ = w.Iterator() },
"DataLength": func() { _ = w.DataLength() },
"Iterator": func() {
it, _ := w.Iterator()
if it != nil {
_ = it.Close()
}
},
"DataLength": func() { _ = w.DataLength() },

Copilot uses AI. Check for mistakes.
Comment on lines +349 to +350
"Fetch": func() { _, _ = w.Fetch(ctx, traceql.FetchSpansRequest{}, opts) },
"FetchSpans": func() { _, _ = w.FetchSpans(ctx, traceql.FetchSpansRequest{}, opts) },
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Fetch / FetchSpans reader functions drop the response without closing resp.Results. Those iterators keep page files open until Close() is called, so this can leak file descriptors across the loop and make the race test flaky. Close Results (preferably in a defer) when the call succeeds.

Suggested change
"Fetch": func() { _, _ = w.Fetch(ctx, traceql.FetchSpansRequest{}, opts) },
"FetchSpans": func() { _, _ = w.FetchSpans(ctx, traceql.FetchSpansRequest{}, opts) },
"Fetch": func() {
resp, err := w.Fetch(ctx, traceql.FetchSpansRequest{}, opts)
if err != nil || resp == nil || resp.Results == nil {
return
}
_ = resp.Results.Close()
},
"FetchSpans": func() {
resp, err := w.FetchSpans(ctx, traceql.FetchSpansRequest{}, opts)
if err != nil || resp == nil || resp.Results == nil {
return
}
_ = resp.Results.Close()
},

Copilot uses AI. Check for mistakes.
readers := map[string]func(){
"FindTraceByID": func() { _, _ = w.FindTraceByID(ctx, id, opts) },
"Search": func() { _, _ = w.Search(ctx, &tempopb.SearchRequest{}, opts) },
"Iterator": func() { _, _ = w.Iterator() },
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The race-check test calls w.Iterator() repeatedly but never closes the returned iterator. walBlock.Iterator() opens parquet files (via rowIterator()), so not closing can leak file descriptors and make this test flaky (or stop exercising races once it hits OS limits). Make the reader close the iterator each time (even if you ignore the error).

Suggested change
"Iterator": func() { _, _ = w.Iterator() },
"Iterator": func() { it, _ := w.Iterator(); if it != nil { _ = it.Close() } },

Copilot uses AI. Check for mistakes.
Comment on lines +348 to +349
"Fetch": func() { _, _ = w.Fetch(ctx, traceql.FetchSpansRequest{}, opts) },
"FetchSpans": func() { _, _ = w.FetchSpans(ctx, traceql.FetchSpansRequest{}, opts) },
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Fetch / FetchSpans reader functions drop the response without closing resp.Results. Those iterators keep page files open until Close() is called, so this can leak file descriptors across the loop and make the race test flaky. Close Results (preferably in a defer) when the call succeeds.

Suggested change
"Fetch": func() { _, _ = w.Fetch(ctx, traceql.FetchSpansRequest{}, opts) },
"FetchSpans": func() { _, _ = w.FetchSpans(ctx, traceql.FetchSpansRequest{}, opts) },
"Fetch": func() {
resp, _ := w.Fetch(ctx, traceql.FetchSpansRequest{}, opts)
if resp != nil && resp.Results != nil {
defer resp.Results.Close()
}
},
"FetchSpans": func() {
resp, _ := w.FetchSpans(ctx, traceql.FetchSpansRequest{}, opts)
if resp != nil && resp.Results != nil {
defer resp.Results.Close()
}
},

Copilot uses AI. Check for mistakes.
Comment on lines +339 to +340
"Iterator": func() { _, _ = w.Iterator() },
"DataLength": func() { _ = w.DataLength() },
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The race-check test calls w.Iterator() repeatedly but never closes the returned iterator. walBlock.Iterator() opens parquet files (via rowIterator()), so not closing can leak file descriptors and make this test flaky (or stop exercising races once it hits OS limits). Make the reader close the iterator each time (even if you ignore the error).

Suggested change
"Iterator": func() { _, _ = w.Iterator() },
"DataLength": func() { _ = w.DataLength() },
"Iterator": func() {
it, err := w.Iterator()
if err != nil {
return
}
_ = it.Close()
},
"DataLength": func() { _ = w.DataLength() },

Copilot uses AI. Check for mistakes.
Comment on lines +348 to +349
"Fetch": func() { _, _ = w.Fetch(ctx, traceql.FetchSpansRequest{}, opts) },
"FetchSpans": func() { _, _ = w.FetchSpans(ctx, traceql.FetchSpansRequest{}, opts) },
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Fetch / FetchSpans reader functions drop the response without closing resp.Results. Those iterators keep page files open until Close() is called, so this can leak file descriptors across the loop and make the race test flaky. Close Results (preferably in a defer) when the call succeeds.

Suggested change
"Fetch": func() { _, _ = w.Fetch(ctx, traceql.FetchSpansRequest{}, opts) },
"FetchSpans": func() { _, _ = w.FetchSpans(ctx, traceql.FetchSpansRequest{}, opts) },
"Fetch": func() {
resp, err := w.Fetch(ctx, traceql.FetchSpansRequest{}, opts)
if err != nil {
return
}
defer resp.Results.Close()
},
"FetchSpans": func() {
resp, err := w.FetchSpans(ctx, traceql.FetchSpansRequest{}, opts)
if err != nil {
return
}
defer resp.Results.Close()
},

Copilot uses AI. Check for mistakes.
@ruslan-mikhailov ruslan-mikhailov merged commit 8d030c8 into grafana:main Mar 26, 2026
31 checks passed
@ruslan-mikhailov ruslan-mikhailov deleted the bugfix/vparquetx-race-conditions branch March 26, 2026 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants