Shrank EncodedLevel to speed up step_in/step_out. #113
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of changes:
The
EncodedLevelstruct is just large enough that variations of it (likeOption<EncodedLevel>andResult<EncodedLevel, _>) can cross LLVM's threshold for usingmemcpyto move values around. This shows up in profiling as substantial overhead in thestep_inandstep_outfunctions, which move instances ofEncodedLevelto and from aVecof container levels.In a particularly deeply nested test file (a single struct with ~800 levels of nested child structs), this caused the reader to be painfully slow. A 15MB file took ~230ms to read through with
next()/step_in()/step_out(), loading each scalar value encountered.This PR shrinks the
EncodedLevelstruct by:usizeoffsets (8 bytes apiece onx86_64) withu8lengths from which offsets can be calculated if necessary.Vecof annotations on eachEncodedLevelwith a commonVecthat lives onCursorState. EachEncodedLevelnow tracks the number of annotations it has pushed onto that communalVec, allowing the reader to use a singleVec/allocation across the entire stream. This dropped the size ofEncodedLevelby a further 23 bytes.Performance test
15MB binary Ion test file containing a single struct with 773 levels of nested values.
Before: 230ms
After: 125ms (-45.65%)
Memory layout
Before
Note that depending on layout/alignment a size of 120 bytes means that,
Option<EncodedValue>andResult<EncodedValue, _>can take 128 bytes even though they only add a single discriminator byte.After
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.