Skip to content

Replace CombineTraceProtos with new Combiner#1291

Merged
mdisibio merged 7 commits intografana:mainfrom
mdisibio:combiner
Feb 24, 2022
Merged

Replace CombineTraceProtos with new Combiner#1291
mdisibio merged 7 commits intografana:mainfrom
mdisibio:combiner

Conversation

@mdisibio
Copy link
Copy Markdown
Contributor

What this PR does:
This PR introduces a new Combiner which is similar to CombineTraceProtos but more efficient when combining more than two inputs. The previous pairwise usage of CombineTraceProtos had a couple inefficiencies when combining more than two inputs: (a) the intermediate result was sorted every time (b) the hash of span tokens was rebuilt every time. Combiner is stateful and improves this, which leads to significant reduction in cpu and memory. Performance is identical when combining just 2 inputs.

Additionally, this PR changes the span/token hashing to 64-bit to reduce the collision rate. Experimentally the collision rate of fnv32 approached 1 in 10,000 spans, which is significant because any collision results in a dropped span. 64-bit has no collisions up to the tested limit of a trace with 1M spans. Performance is still good.

Feedback
In order to maintain identical performance against 2 segments, Combiner must not save the span tokens for the second input, like how CombineTraceProtos did not. This can be generalized in that we never need to save the span tokens for the last input. These savings are significant enough to where it's worth accounting for, and there are many cases where we do know the length. I would like feedback on the chosen ergonomics/naming/style and see if there is a better pattern. For example:

// Unknown number of inputs
c := NewCombiner()
c.Consume(tr)
c.Consume(tr)
    ...
c.Result()

// Known number of inputs
c := NewCombiner()
for i, tr := range traces {
    c.ConsumeWithFinal(tr, i == len(traces)-1)
}
c.Result()

Benchmarks
This benchmark combines trace 2 to 8 segments of 100K spans each. Improvements are greater as more segments are combined.

BenchmarkCombine/2/CombineTraceProtos-12         	      27	  44842636 ns/op	 6316197 B/op	  115510 allocs/op
BenchmarkCombine/2/Combiner-12                   	      25	  44691228 ns/op	 6319784 B/op	  115615 allocs/op

BenchmarkCombine/3/CombineTraceProtos-12         	       9	 125415228 ns/op	17512674 B/op	  291023 allocs/op
BenchmarkCombine/3/Combiner-12                   	      15	  72986714 ns/op	11202546 B/op	  175560 allocs/op

BenchmarkCombine/4/CombineTraceProtos-12         	       5	 241490759 ns/op	35993108 B/op	  524033 allocs/op
BenchmarkCombine/4/Combiner-12                   	      10	 108902778 ns/op	18498868 B/op	  233082 allocs/op

BenchmarkCombine/8/CombineTraceProtos-12         	       1	1119084275 ns/op	164185168 B/op	 2061117 allocs/op
BenchmarkCombine/8/Combiner-12                   	       5	 242659052 ns/op	 37625163 B/op	  469882 allocs/op

Which issue(s) this PR fixes:
Should help #976

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Comment thread modules/frontend/tracebyidsharding.go
@mdisibio
Copy link
Copy Markdown
Contributor Author

@tanner-bruce Would appreciate your feedback too as you've also done of a lot of investigation in this area.

Copy link
Copy Markdown
Collaborator

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drooling over some of these benchmarks. We should definitely see some improvements in resource usage during both compaction and querying. A couple of baby nits and one real question.

Nice!

Comment thread pkg/model/trace/combine_test.go Outdated
Comment thread pkg/model/trace/combine.go Outdated
Comment thread modules/frontend/tracebyidsharding.go
Comment thread pkg/model/trace/combine.go Outdated
@mdisibio
Copy link
Copy Markdown
Contributor Author

Pushed an optimization to alloc the span map using the first input size, which saves a few more MB per call. Thanks @tanner-bruce !

BenchmarkCombine/2/Combiner-12         	      27	  41954908 ns/op	 4650348 B/op	  113443 allocs/op
BenchmarkCombine/3/Combiner-12         	      15	  72924713 ns/op	 9526314 B/op	  173208 allocs/op
BenchmarkCombine/4/Combiner-12         	      10	 105436694 ns/op	16823280 B/op	  230750 allocs/op
BenchmarkCombine/8/Combiner-12         	       5	 237391203 ns/op	35946148 B/op	  467689 allocs/op

@mdisibio mdisibio requested a review from joe-elliott February 24, 2022 12:45
Copy link
Copy Markdown
Collaborator

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks for the help on this one @tanner-bruce!

@mdisibio mdisibio enabled auto-merge (squash) February 24, 2022 17:55
@mdisibio mdisibio merged commit 7d26ee2 into grafana:main Feb 24, 2022
@mdisibio mdisibio deleted the combiner branch April 25, 2023 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants