Replace CombineTraceProtos with new Combiner#1291
Merged
mdisibio merged 7 commits intografana:mainfrom Feb 24, 2022
Merged
Conversation
mdisibio
commented
Feb 17, 2022
Contributor
Author
|
@tanner-bruce Would appreciate your feedback too as you've also done of a lot of investigation in this area. |
joe-elliott
reviewed
Feb 17, 2022
Contributor
Author
|
Pushed an optimization to alloc the span map using the first input size, which saves a few more MB per call. Thanks @tanner-bruce ! |
joe-elliott
approved these changes
Feb 24, 2022
Collaborator
joe-elliott
left a comment
There was a problem hiding this comment.
Looks great. Thanks for the help on this one @tanner-bruce!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does:
This PR introduces a new Combiner which is similar to CombineTraceProtos but more efficient when combining more than two inputs. The previous pairwise usage of CombineTraceProtos had a couple inefficiencies when combining more than two inputs: (a) the intermediate result was sorted every time (b) the hash of span tokens was rebuilt every time. Combiner is stateful and improves this, which leads to significant reduction in cpu and memory. Performance is identical when combining just 2 inputs.
Additionally, this PR changes the span/token hashing to 64-bit to reduce the collision rate. Experimentally the collision rate of fnv32 approached 1 in 10,000 spans, which is significant because any collision results in a dropped span. 64-bit has no collisions up to the tested limit of a trace with 1M spans. Performance is still good.
Feedback
In order to maintain identical performance against 2 segments, Combiner must not save the span tokens for the second input, like how CombineTraceProtos did not. This can be generalized in that we never need to save the span tokens for the last input. These savings are significant enough to where it's worth accounting for, and there are many cases where we do know the length. I would like feedback on the chosen ergonomics/naming/style and see if there is a better pattern. For example:
Benchmarks
This benchmark combines trace 2 to 8 segments of 100K spans each. Improvements are greater as more segments are combined.
Which issue(s) this PR fixes:
Should help #976
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]