[Bugfix] vParquetX: race conditions#6773
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses data races in the vParquet WAL block implementations (vparquet3/4/5), aiming to make concurrent reads (search/fetch/iterate) safe while writes/flushes are happening.
Changes:
- Add mutex protection around
flushedSize/unflushedSizeupdates andDataLength()reads. - Avoid iterating
b.flusheddirectly by routingIterator()/FindTraceByID()throughreadFlushes(). - Add a new concurrent “race condition check” test for WAL blocks in vparquet3/4/5.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tempodb/encoding/vparquet5/wal_block.go | Synchronizes size accounting and iteration over flushed WAL pages. |
| tempodb/encoding/vparquet4/wal_block.go | Same synchronization changes for vparquet4. |
| tempodb/encoding/vparquet3/wal_block.go | Same synchronization changes for vparquet3. |
| tempodb/encoding/vparquet5/wal_block_test.go | Adds a concurrent stress test intended to reproduce races under -race. |
| tempodb/encoding/vparquet4/wal_block_test.go | Adds the same concurrent race-check test for vparquet4. |
| tempodb/encoding/vparquet3/wal_block_test.go | Adds the same concurrent race-check test for vparquet3. |
87de5ed to
cd681bb
Compare
cd681bb to
5b78063
Compare
5b78063 to
de6cafa
Compare
| @@ -355,7 +355,9 @@ func (b *walBlock) AppendTrace(id common.ID, trace *tempopb.Trace, start, end ui | |||
| b.meta.ObjectAdded(start, end) | |||
There was a problem hiding this comment.
I need to add to changelog
| b.meta.ObjectAdded(start, end) | ||
| b.ids.Set(id, int64(b.ids.Len())) // Next row number | ||
|
|
||
| b.mtx.Lock() |
There was a problem hiding this comment.
nit: Since this is at the end of the method, we could defer the unlock and stylistically I think that reads better.
de6cafa to
1220425
Compare
1220425 to
b2456d6
Compare
|
+ rebase to resolve conflicts in changelog |
| "Iterator": func() { _, _ = w.Iterator() }, | ||
| "DataLength": func() { _ = w.DataLength() }, |
There was a problem hiding this comment.
The race-check test calls w.Iterator() repeatedly but never closes the returned iterator. walBlock.Iterator() opens parquet files (via rowIterator()), so not closing can leak file descriptors and make this test flaky (or stop exercising races once it hits OS limits). Make the reader close the iterator each time (even if you ignore the error).
| "Iterator": func() { _, _ = w.Iterator() }, | |
| "DataLength": func() { _ = w.DataLength() }, | |
| "Iterator": func() { | |
| it, _ := w.Iterator() | |
| if it != nil { | |
| _ = it.Close() | |
| } | |
| }, | |
| "DataLength": func() { _ = w.DataLength() }, |
| "Fetch": func() { _, _ = w.Fetch(ctx, traceql.FetchSpansRequest{}, opts) }, | ||
| "FetchSpans": func() { _, _ = w.FetchSpans(ctx, traceql.FetchSpansRequest{}, opts) }, |
There was a problem hiding this comment.
The Fetch / FetchSpans reader functions drop the response without closing resp.Results. Those iterators keep page files open until Close() is called, so this can leak file descriptors across the loop and make the race test flaky. Close Results (preferably in a defer) when the call succeeds.
| "Fetch": func() { _, _ = w.Fetch(ctx, traceql.FetchSpansRequest{}, opts) }, | |
| "FetchSpans": func() { _, _ = w.FetchSpans(ctx, traceql.FetchSpansRequest{}, opts) }, | |
| "Fetch": func() { | |
| resp, err := w.Fetch(ctx, traceql.FetchSpansRequest{}, opts) | |
| if err != nil || resp == nil || resp.Results == nil { | |
| return | |
| } | |
| _ = resp.Results.Close() | |
| }, | |
| "FetchSpans": func() { | |
| resp, err := w.FetchSpans(ctx, traceql.FetchSpansRequest{}, opts) | |
| if err != nil || resp == nil || resp.Results == nil { | |
| return | |
| } | |
| _ = resp.Results.Close() | |
| }, |
| readers := map[string]func(){ | ||
| "FindTraceByID": func() { _, _ = w.FindTraceByID(ctx, id, opts) }, | ||
| "Search": func() { _, _ = w.Search(ctx, &tempopb.SearchRequest{}, opts) }, | ||
| "Iterator": func() { _, _ = w.Iterator() }, |
There was a problem hiding this comment.
The race-check test calls w.Iterator() repeatedly but never closes the returned iterator. walBlock.Iterator() opens parquet files (via rowIterator()), so not closing can leak file descriptors and make this test flaky (or stop exercising races once it hits OS limits). Make the reader close the iterator each time (even if you ignore the error).
| "Iterator": func() { _, _ = w.Iterator() }, | |
| "Iterator": func() { it, _ := w.Iterator(); if it != nil { _ = it.Close() } }, |
| "Fetch": func() { _, _ = w.Fetch(ctx, traceql.FetchSpansRequest{}, opts) }, | ||
| "FetchSpans": func() { _, _ = w.FetchSpans(ctx, traceql.FetchSpansRequest{}, opts) }, |
There was a problem hiding this comment.
The Fetch / FetchSpans reader functions drop the response without closing resp.Results. Those iterators keep page files open until Close() is called, so this can leak file descriptors across the loop and make the race test flaky. Close Results (preferably in a defer) when the call succeeds.
| "Fetch": func() { _, _ = w.Fetch(ctx, traceql.FetchSpansRequest{}, opts) }, | |
| "FetchSpans": func() { _, _ = w.FetchSpans(ctx, traceql.FetchSpansRequest{}, opts) }, | |
| "Fetch": func() { | |
| resp, _ := w.Fetch(ctx, traceql.FetchSpansRequest{}, opts) | |
| if resp != nil && resp.Results != nil { | |
| defer resp.Results.Close() | |
| } | |
| }, | |
| "FetchSpans": func() { | |
| resp, _ := w.FetchSpans(ctx, traceql.FetchSpansRequest{}, opts) | |
| if resp != nil && resp.Results != nil { | |
| defer resp.Results.Close() | |
| } | |
| }, |
| "Iterator": func() { _, _ = w.Iterator() }, | ||
| "DataLength": func() { _ = w.DataLength() }, |
There was a problem hiding this comment.
The race-check test calls w.Iterator() repeatedly but never closes the returned iterator. walBlock.Iterator() opens parquet files (via rowIterator()), so not closing can leak file descriptors and make this test flaky (or stop exercising races once it hits OS limits). Make the reader close the iterator each time (even if you ignore the error).
| "Iterator": func() { _, _ = w.Iterator() }, | |
| "DataLength": func() { _ = w.DataLength() }, | |
| "Iterator": func() { | |
| it, err := w.Iterator() | |
| if err != nil { | |
| return | |
| } | |
| _ = it.Close() | |
| }, | |
| "DataLength": func() { _ = w.DataLength() }, |
| "Fetch": func() { _, _ = w.Fetch(ctx, traceql.FetchSpansRequest{}, opts) }, | ||
| "FetchSpans": func() { _, _ = w.FetchSpans(ctx, traceql.FetchSpansRequest{}, opts) }, |
There was a problem hiding this comment.
The Fetch / FetchSpans reader functions drop the response without closing resp.Results. Those iterators keep page files open until Close() is called, so this can leak file descriptors across the loop and make the race test flaky. Close Results (preferably in a defer) when the call succeeds.
| "Fetch": func() { _, _ = w.Fetch(ctx, traceql.FetchSpansRequest{}, opts) }, | |
| "FetchSpans": func() { _, _ = w.FetchSpans(ctx, traceql.FetchSpansRequest{}, opts) }, | |
| "Fetch": func() { | |
| resp, err := w.Fetch(ctx, traceql.FetchSpansRequest{}, opts) | |
| if err != nil { | |
| return | |
| } | |
| defer resp.Results.Close() | |
| }, | |
| "FetchSpans": func() { | |
| resp, err := w.FetchSpans(ctx, traceql.FetchSpansRequest{}, opts) | |
| if err != nil { | |
| return | |
| } | |
| defer resp.Results.Close() | |
| }, |
What this PR does: fixes race conditions found by
TestWalBlockRaceConditionCheckran with-race -count=10Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]