statement-store: make encode/hash faster#10882
Merged
Conversation
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
P1sar
approved these changes
Jan 22, 2026
AndreiEres
approved these changes
Jan 23, 2026
Contributor
AndreiEres
left a comment
There was a problem hiding this comment.
Should we reduce calls of hash() then and pass it into arguments where it's possible?
AndreiEres
reviewed
Jan 23, 2026
Co-authored-by: Andrei Eres <eresav@me.com>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
bkchr
reviewed
Jan 27, 2026
Comment on lines
+498
to
+515
| // Calculate capacity for preallocation as a close approximation of the SCALE-encoded | ||
| // size without actually performing the encoding. Uses size_of for type sizes: | ||
| // - Compact length prefix: 1-5 bytes (assume 5 for safety) | ||
| // - Proof field: 1 (tag) + 1 (enum discriminant) + size_of::<Proof>() | ||
| // - DecryptionKey: 1 (tag) + size_of::<DecryptionKey>() | ||
| // - Priority: 1 (tag) + size_of::<u32>() | ||
| // - Channel: 1 (tag) + size_of::<Channel>() | ||
| // - Each topic: 1 (tag) + size_of::<Topic>() | ||
| // - Data: 1 (tag) + 5 (compact len) + data.len() | ||
| let proof_size = | ||
| if !for_signing && self.proof.is_some() { 1 + 1 + size_of::<Proof>() } else { 0 }; | ||
| let decryption_key_size = | ||
| if self.decryption_key.is_some() { 1 + size_of::<DecryptionKey>() } else { 0 }; | ||
| let priority_size = if self.priority.is_some() { 1 + size_of::<u32>() } else { 0 }; | ||
| let channel_size = if self.channel.is_some() { 1 + size_of::<Channel>() } else { 0 }; | ||
| let topics_size = self.num_topics as usize * (1 + size_of::<Topic>()); | ||
| let data_size = self.data.as_ref().map_or(0, |d| 1 + 5 + d.len()); | ||
| let compact_prefix_size = if !for_signing { 5 } else { 0 }; |
Member
There was a problem hiding this comment.
You should either use max_encoded_len (instead of size_of) or you just come up with some worst case max encoded len for the entire struct. In the end, even if this is a little bit above, it will not hurt that much too always over allocate.
Contributor
Author
There was a problem hiding this comment.
Modified it to use max_encoded_len, also added a test to make sure estimated is always slightly higher than the actual encoded size.
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
AndreiEres
approved these changes
Jan 28, 2026
P1sar
approved these changes
Jan 28, 2026
bkchr
approved these changes
Jan 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
By reserving the memory in advance we halve the encoding speed which ultimately speeds up the statement.hash() function which gets called in a lot of places.
More importantly, when we start being connected to more nodes the hash function gets called a lot for the same statement because we might receive the same statement from all peers we are connected to.
For example on versi on_statements ate a lot of time when running with 15 nodes, see #10814 (comment).
Modified the statement_network benchmark to also be parameterizable by the number of times we might receive a statement and if we receive it from 16 peers, we notice a speed up with this PR of ~16%, which I consider not negligible, so I consider this an worthy improvement.