enhancement: supported deduping spans within block builder#6539
enhancement: supported deduping spans within block builder#6539zhxiaogg merged 6 commits intografana:mainfrom
Conversation
3ff9b6f to
49bbfd4
Compare
| for _, ss := range rs.ScopeSpans { | ||
| unique := ss.Spans[:0] | ||
| for _, s := range ss.Spans { | ||
| token := util.SpanIDAndKindToToken(s.SpanId, int(s.Kind)) |
There was a problem hiding this comment.
I wonder if we should use this hashing algorithm instead:
tempo/pkg/model/trace/combine.go
Line 27 in 2ad734b
It seems more correct, with less chances of collisions
There was a problem hiding this comment.
thanks, changed to use the alternative one you suggested!
stoewer
left a comment
There was a problem hiding this comment.
To avoid heap allocations and traversing each trace twice, I'd suggest to remove dedupeTrace() and implement it's functionality inside the loop that updates timestamp bounds. Roughly like this:
seen := make(map[uint64]struct{}, 1024) // initialize seen before the outer `for entries := range seq` loop
...
for _, rs := range tr.ResourceSpans {
for _, ss := range rs.ScopeSpans {
unique := ss.Spans[:0]
for _, s := range ss.Spans {
// dedup and update timestamps
}
ss.Spans = unique
}
}
clear(seen)While this is less readable than having a separate dedup function, it might still be worth it for performance reasons. What do you think?
| var deduped uint32 | ||
| for _, rs := range tr.ResourceSpans { | ||
| for _, ss := range rs.ScopeSpans { | ||
| unique := ss.Spans[:0] |
| // dedupeTrace removes duplicate spans in-place from tr, deduplicating by span ID and kind. | ||
| // Returns the number of removed duplicate spans. | ||
| func dedupeTrace(tr *tempopb.Trace) uint32 { | ||
| seen := make(map[uint64]struct{}) |
There was a problem hiding this comment.
There is an opportunity here to save lots of smaller heap allocations:
- Initialize
seenwith a reasonable start size in to avoid allocations and rehashing when the map grows - Reuse the
seenin multiple deduplications (and useclear(seen)before reuse)
There was a problem hiding this comment.
thanks, all changed including the following comments.
| } | ||
|
|
||
| // Deduplicate spans within the trace | ||
| i.dedupedSpans += dedupeTrace(tr) |
There was a problem hiding this comment.
Each trace is traversed twice: one time in dedupeTrace() and another time in L118 to update timestamp bounds. Maybe those can be combined
| } | ||
|
|
||
| // DedupedSpans returns the total number of duplicate spans that were removed | ||
| // across all traces. The iterator must be exhausted before this can be accessed. |
There was a problem hiding this comment.
The iterator must be exhausted before this can be accessed
Just for the sake of being a bit more defensive, what do you think about enforcing this by returning an error when liveTracesIter.DedupedSpans() is called before it's exhausted?
3e9676d to
63f569f
Compare
What this PR does:
tempo_block_builder_spans_deduped_totalmetricWhich issue(s) this PR fixes:
Fixes #6516
Test
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]