-
Notifications
You must be signed in to change notification settings - Fork 95
Add first version of the spec for the Head-MPT State network #389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Why is this being proposed as a separate network from the I think we should either
All the value is solely in being able to access the State at the HEAD of the chain i.e. |
I can't speak for Milos' reasons, but I will say that developing this as a separate network and then merging it into the existing state network might have some advantages. In the end, I believe these should all be part of the same |
Here are my initial thoughts on this:
|
Co-authored-by: Piper Merriam <[email protected]>
I would say that the main reason is what Piper said as well. This is very experimental and I would rather have separate network that is unstable. Once it becomes more stable, we can consider merging two networks into one. We could specify that if one is using new network, they should also participate in old network as well (I didn't document it yet, but using old state network is probably the easiest way to bootstrap fresh node). On top of that, there are many concepts that behave very differently, for example: the interpretation of radius and how it's used, distance function (we might have to switch it to something other than XOR). |
Replies to @bhartnett
We still need
I'm not set on exact naming. Happy to brainstorm better names.
Probably I didn't specify it clearly enough, but it isn't a plan for every node to store entire depth 2 subtrie. Hope I was able to explain this. Will think how I can make spec clearer.
The simplest way is to use current state network for syncing their desired subtrie. Once that is synched, they use trie-diffs to catch up to head of the network.
Correct. If node has an entire view of their subtrie, they should be able to create exclusion proof for that subtrie.
Exact details to be figured out when we start implementing things. But yes, idea is to keep TrieDiffs from reorgs around, at least until chain is finalized.
Yes, I think we should support bytecode as well. I think I even dedicated content key/value for it, but didn't specify it as it's quite simple and I didn't have time. |
Well what I meant is that we could use stateRoot instead of blockHash in the content types because there is exactly one stateRoot for every block. StateRoot would work for the Account Trie Node lookup because for example when you do a eth_getBalance lookup by block number we lookup the block header and pull out the state root which can then be used to fetch the account data. Using stateRoot in this case is nice because we can also create another debug_getBalanceByStateRoot endpoint that doesn't require the history network to be enabled because it can look up the content by state root directly. This is just an example to show the point that it could help a bit to make this subnetwork more loosely coupled from the history network. Now having said that, I understand that for the Account Trie Diff type we need the block hash so that when validating the content we can look up the block header and check that the content matches and is anchored to the stateRoot in the header but as an alternative idea if we stored the last 256 stateRoots in memory (collected from the history network in the background) then we could use stateRoot here as well. This loose coupling also makes testing easier.
I understand what you mean. This sounds reasonable to me and I think it could work. There is still the risk that the size of the subtrie diffs would grow over time as Ethereum's gas limit increases etc but that is likely a problem that would be solved/improved by verkle/binary tries which should reduce the proof sizes.
Makes sense. Something to point out is that syncing a sub-trie might take a decent amount of time and during the process the node might fall behind the finalized block (which is moving forward every 12 seconds) and so it would need to have access to the trie diffs from a bit before the latest finalized block. |
Regarding block_hash vs state_root, I don't have strong preference. I think state_root is more likely to be fragile and hard to handle/recover if something is not working. But we can give it a try and keep it if everything works well with it.
It can accept, verify and store relevant trie-diffs locally while subtrie is synching. This way, syncing can last much longer. Alternatively, it can sync smaller subtries separately and merge them. Would require more work and research to figure out the best way to do it, but it sounds doable. Ultimately, big area of research on how to do this and many other things. Current spec is high level view and direction that we think is the best at the moment. |
Yes this is a good idea.
For this I was thinking we would walk down the sub trie from the starting state root and do breadth first search so that some of the look ups could be requested concurrently to speed up the process.
Sure, sounds good. |
I think that we've found in the past that |
associated block. The present part of both partial views should match (and the same for missing | ||
part). The trie nodes are ordered as if they are visited using | ||
[pre-order Depth First Search](https://en.wikipedia.org/wiki/Tree_traversal#Pre-order,_NLR) | ||
traversal algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@morph-dev Is there a specific reason why pre-order Depth First Search was selected as the ordering of the trie nodes in the trie diff trie node lists?
I'm wondering if using Breadth First Search ordering of the trie nodes would be better because it might lead to simpler algorithms that can be defined recursively and are easier to reason about. For example when validating the trie nodes in the trie diff in place (without using additional memory/storage) we need to walk down each of the paths in the trie and check that the hashes of the child nodes match the hashes in the parent nodes. With a breadth first ordering we can define a recursive algorithm that takes a parent node, looks at how many children it should have and then check the hashes of the next n child nodes in the trie node list.
I'm sure it can still be done using pre-order Depth First Search but I'm thinking that Breadth First Search ordering might be easier to understand because the trie nodes are organised in layers based on trie depth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no strong reason for picking DFS over BFS, but I think that I disagree that BFS would be easier to reason about (especially if you plan to use recursive algorithm), because:
- not all trie nodes will be present
- usage of extension nodes in MPT gives wrong idea of layers, making it harder to reason about (in my opinion)
All in all, I don't have strong preferences, but my intuition is that DFS is simpler. But if most people prefer one, I don't mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion either way so I don't mind sticking with DFS if that is preferred.
Yes true, not all trie nodes will be present because it is just a diff not the full state subtree. This makes me wonder, for each parent node in a trie node list how can we know which child nodes are encoded in the list? It would be useful to know this.
I guess we could use a trial and error method where we hash the child nodes and see which of them match the hashes in the parent nodes (not sure how well this would work). But perhaps it would be better to encode in the data structure this information somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I envisioned is that:
- you iterate both pre-block and post-block lists of trie nodes
- calculate their hashes
- put them in two separate maps, together with their index
- you traverse both tries in parallel (starting from the root), and you keep track that indices are in consecutive order
- if hash of the next trie node in both tries is the same, you skip it
- othewise you look it up in maps you created earlier (they should be present), and continue traversal
There is probably some tricky logic with dealing with extension nodes and branches, and/or when some subtrie is not missing, I think this approach can be adjusted for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only skimmed this conversation, but I believe Depth-First is the correct ordering here. My previous experience working with MPT proofs suggests that efficient algorithms can generally be written for either ordering, but that Depth-First is the more natural way to do this.
```py | ||
def interested(node, content): | ||
bits = content.id.length | ||
return distance(node.id[:bits], content.id) <= node.radius[:bits] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the trie diff data type which has a path of len = 8, if a node has a radius where radius mod 256 == 255
then node.radius[:bits]
will equal 0xFF
and this will always be greater than or equal to distance(node.id[:bits], content.id)
and therefore the node would be interested in all trie diffs not just a subset.
I guess a similar problem can happen for the other data types that have a longer path but with a lower probability of occurrence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed that type of distance and radius is U256 (unsigned 256 bit integer). When represented as bytes, I assumed big-endian ordering, so if bits=8
, then node.radius[:bits]
represent the highest byte of the radius (equivalent to radius / 2^248
).
Is that what you understood as well? And is logic still faulty?
If it was only about understanding the syntax, I'm accepting suggestions to make it clearer :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I miss understood the python slice syntax here, I read it as the last 8 bits but it is actually the first 8 bits of the radius. There is still a similar problem as described above but the reverse scenario. If nodes that have a radius where the first 8 bits are 0 (very common) then they will store almost no content (only content with a distance of zero)
Perhaps we should change the distance function to something like this:
func interested(node, content):
bits = content.id.length
return logdistance(node.id[:bits], content.id) <= log2(node.radius)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nodes that have a radius where the first 8 bits are 0 (very common) then they will store almost no content (only content with a distance of zero)
That's why I use <=
in the condition. Only 1 of the 256 subtrie-diffs will share path with the node.id
, and only that one will have distance(node.id[:bits], content.id)
equal to 0 as well (and function would return true).
I think that logdistance would work as well (at least in the case of the gossip), as they should be pretty much the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I use
<=
in the condition. Only 1 of the 256 subtrie-diffs will share path with thenode.id
, and only that one will havedistance(node.id[:bits], content.id)
equal to 0 as well (and function would return true).
I see. Well given that scenario, if a node sets it's storage capacity to zero and therefore sets it's radius to zero, then it would still store one of the subtrie-diffs is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, that's interesting. We have the current expectation of being able to set the storage to 0 and accept nothing in history. It would be surprising to me to set storage to 0 and accept/store content anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, that's interesting. We have the current expectation of being able to set the storage to 0 and accept nothing in history. It would be surprising to me to set storage to 0 and accept/store content anyway.
Well, yes and no. According to spec (history, state), we define following:
interested(node, content) = distance(node.id, content.id) <= node.data_radius
So even if radius
is zero, node should store content that has content.id
that is exactly the same as its node.id
. Of course, in existing networks, this wouldn't happen in practice.
With that being said, I'm also fine having some rules that lossen the requirements for storing, for example one of these would work:
- if your radius is zero, you don't have to store anything
- if there are no leafs that fall within your radius, no need to store inner nodes either
- note that if there is at least one leaf within radius, nodes should store all inner nodes within radius in order to have exclustion proofs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that nodes that have a zero radius should be allowed to store nothing even with the existing distance function but why not change the distance function to use <
instead of <=
?
For example:
interested(node, content) = distance(node.id, content.id) < node.data_radius
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Than the node that has the maximum radius (0xFFFF..FF) shouldn't store content that is farthest away (e.g. node.id=0x3333..33
and content.id=CCCC..CC
). So spec is not very clean...
Also, comparing trie path of the subtrie (or any inner node for that matter) works much better for <=
(see my messages earlier in this thread).
The network stores execution layer state content, which encompases the following data from the | ||
latest 256 blocks: | ||
|
||
- Block trie-diffs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to get validation of this concept done since it's a core building block of the network designs. We need test data for:
- A block level diff that is correct (minimal and complete)
- A block level diff that contains extra data and the previous block level proof that allows detection of this.
- A block level diff that omits a change to the state and the previous block level proof that allows detection of this.
The content id is derived only from content's trie path (and contract's address in the case of the | ||
contracts's trie node). Its primary use case is in determining which nodes on the network should | ||
store the content. | ||
|
||
The content id has following properties are: | ||
|
||
- It has variable length, between 0 and 256 bits (only multiplies of 4), representing trie path | ||
- It's not unique, meaning different content will have the same content id | ||
- The trie path of the contract's storage trie node is modified using contract's address | ||
|
||
The derivation function is slightly different for different types and is defined below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not understanding the motivation/need/justification for a variable length content-id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that they would have to be 32 bytes. And you would generate them using my approach plus appending zeroes, right?
The problem is that you can't distinguish between root node, and first node of the first level (path: 0), and first node on the second level (path: 00), as they would all have content_id: 0x000000..00
You can make it 34 bytes long and include (e.g. at start) the length. But in that case you might just as well make them different length.
I could be mistaken and there is some way around it.
|
||
```py | ||
selector = 0x30 | ||
account_trie_diff = Container(path: u8, block_hash: Bytes32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does path make more sense as a Bytes2
type? Effectively interchangeable with u8
but maybe more intuitive as a trie path as a bytes type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean as Bytes1
? Or you mean Nibbles
(that will in practice have length two, but type is variable length)?
Or you actually mean Bytes2
where each byte has value [0,15]?
I don't have strong preference. We already bundle 2 nibbles into one byte as part of the Nibbles type, so I though I would just assume that two nibbles are bundled into one byte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant Bytes1
I think we benefit from being less focused on how long until the sync completes at the full target radius, and more focused on how long until the client first reaches a helpful state. A helpful state is one that can serve any state data to a peer (especially the data nearest your Node ID). My first read is that the parallel breadth first search makes the opposite tradeoff: it speeds up the final sync time, at the cost of increasing the latency until the client reaches a helpful state. Reaching that helpful state quickly might look like: starting with a tiny sync radius for walking the trie. With a small enough sync radius, walking the trie can happen in a minutes, maybe even seconds. Then you can incrementally increase the syncing radius, and be serving more and more content (the stuff nearest your Node ID) all along the way. I suppose this is not something that should be defined as a spec requirement. We want to encourage client experimentation. But if we have a mild consensus on it, it seems like a good recommendation to include for clients that are not opinionated about the choice. |
I agree with this as a general goal but in this case we would be syncing the state from the existing trie node state network and so the state is already available for the block from which we start syncing and so other nodes don't benefit much from having the flat state here. Of course there is a small performance improvement having the flat state but that is all.
Yes I think it would improve the overall speed of the sync because you maximize the amount of concurrent network look ups.
I think dynamically changing the radius for this flat state model would be rather complex because you would need to re-sync the missing state from the last finalized block and then apply the trie diffs forward to get back to the head. I would think that supporting changes to the radius size should be something that is implemented later down the track. I could be missing something but that is my understanding. |
This PR introduces spec for the "Head-MPT State" network, responsible for storing state content that is close to the head of the chain (while allowing faster access to the state).
This is very much work-in-progress. It's open for discussion and debate, and more improvements will come as we start working on it.
First implementation should try to work only on Account Trie, and reevaluate the approach, and if it would work on contract tries as well.