Skip to content

enhancement: supported deduping spans within block builder#6539

Merged
zhxiaogg merged 6 commits intografana:mainfrom
zhxiaogg:dedupe-spans-within-block-builder
Feb 26, 2026
Merged

enhancement: supported deduping spans within block builder#6539
zhxiaogg merged 6 commits intografana:mainfrom
zhxiaogg:dedupe-spans-within-block-builder

Conversation

@zhxiaogg
Copy link
Copy Markdown
Contributor

@zhxiaogg zhxiaogg commented Feb 23, 2026

What this PR does:

  • Block builder: deduplicate spans within traces during block creation and track removed duplicates via tempo_block_builder_spans_deduped_total metric

Which issue(s) this PR fixes:
Fixes #6516

Test

  • can verify the metrics are generated and showing in prometheus ui when testing with singlebinary example.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

for _, ss := range rs.ScopeSpans {
unique := ss.Spans[:0]
for _, s := range ss.Spans {
token := util.SpanIDAndKindToToken(s.SpanId, int(s.Kind))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should use this hashing algorithm instead:

func tokenForID(h hash.Hash64, buffer []byte, kind int32, b []byte) token {

It seems more correct, with less chances of collisions

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, changed to use the alternative one you suggested!

Copy link
Copy Markdown
Contributor

@stoewer stoewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid heap allocations and traversing each trace twice, I'd suggest to remove dedupeTrace() and implement it's functionality inside the loop that updates timestamp bounds. Roughly like this:

seen := make(map[uint64]struct{}, 1024) // initialize seen before the outer `for entries := range seq` loop

...
for _, rs := range tr.ResourceSpans {
    for _, ss := range rs.ScopeSpans {
        unique := ss.Spans[:0]
        for _, s := range ss.Spans {
            // dedup and update timestamps 
        }
        ss.Spans = unique
    }
}
clear(seen)

While this is less readable than having a separate dedup function, it might still be worth it for performance reasons. What do you think?

var deduped uint32
for _, rs := range tr.ResourceSpans {
for _, ss := range rs.ScopeSpans {
unique := ss.Spans[:0]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice in-place dedup 👍

// dedupeTrace removes duplicate spans in-place from tr, deduplicating by span ID and kind.
// Returns the number of removed duplicate spans.
func dedupeTrace(tr *tempopb.Trace) uint32 {
seen := make(map[uint64]struct{})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an opportunity here to save lots of smaller heap allocations:

  • Initialize seen with a reasonable start size in to avoid allocations and rehashing when the map grows
  • Reuse the seen in multiple deduplications (and use clear(seen) before reuse)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, all changed including the following comments.

}

// Deduplicate spans within the trace
i.dedupedSpans += dedupeTrace(tr)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each trace is traversed twice: one time in dedupeTrace() and another time in L118 to update timestamp bounds. Maybe those can be combined

}

// DedupedSpans returns the total number of duplicate spans that were removed
// across all traces. The iterator must be exhausted before this can be accessed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iterator must be exhausted before this can be accessed

Just for the sake of being a bit more defensive, what do you think about enforcing this by returning an error when liveTracesIter.DedupedSpans() is called before it's exhausted?

@zhxiaogg zhxiaogg force-pushed the dedupe-spans-within-block-builder branch from 3e9676d to 63f569f Compare February 25, 2026 15:49
Copy link
Copy Markdown
Contributor

@javiermolinar javiermolinar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@zhxiaogg zhxiaogg merged commit 91948b8 into grafana:main Feb 26, 2026
40 of 41 checks passed
@zhxiaogg zhxiaogg deleted the dedupe-spans-within-block-builder branch February 26, 2026 15:49
zalegrala pushed a commit to zalegrala/tempo that referenced this pull request Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Need to deduplicate spans in the block builder

3 participants