Skip to content

Add first version of the spec for the Head-MPT State network #389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

morph-dev
Copy link
Contributor

This PR introduces spec for the "Head-MPT State" network, responsible for storing state content that is close to the head of the chain (while allowing faster access to the state).

This is very much work-in-progress. It's open for discussion and debate, and more improvements will come as we start working on it.

First implementation should try to work only on Account Trie, and reevaluate the approach, and if it would work on contract tries as well.

@morph-dev morph-dev self-assigned this Apr 9, 2025
@KolbyML
Copy link
Member

KolbyML commented Apr 9, 2025

Why is this being proposed as a separate network from the StateNetwork. If we have Execution Head-MPT State Network, I think we should get rid of the StateNetwork.

I think we should either

  • have Execution Head-MPT State Network as part of the StateNetwork
  • get rid of the StateNetwork and only have the Execution Head-MPT State Network

All the value is solely in being able to access the State at the HEAD of the chain i.e. Execution Head-MPT State Network, so if they are separate I don't think StateNetwork is sustainable. StateNetwork can only be justified if it is subsidized by the utility of Execution Head-MPT State Network

@pipermerriam
Copy link
Member

Why is this being proposed as a separate network

I can't speak for Milos' reasons, but I will say that developing this as a separate network and then merging it into the existing state network might have some advantages. In the end, I believe these should all be part of the same state-network but I'm not opposed to developing them separately until this spec has solidified.

@bhartnett
Copy link
Contributor

bhartnett commented Apr 10, 2025

Here are my initial thoughts on this:

  • I think we should use stateRoot instead of blockHash in the data types to reduce coupling with the history network and block headers. This will make implementations that pass in the stateRoot directly possible without having to look up the block header.

  • The naming of Account Trie Node and Contract Trie Node types is confusing because we don't just look up a single trie node for these content types, but rather a proof is returned. They also clash with the names in the current state network. Perhaps these should be named Account Trie Proof and Contract Trie Proof or something else.

  • I wonder if the requirement for clients to store an entire subtrie at depth 2 is too large. If I understand correctly, this would approximately be the size of the entire state trie divided by 256? Might be too large for small/lightweight clients keeping in mind that the state will also grow over time. If we think this might be a problem we could consider designs where the size of the subtrees are not fixed to accommodate nodes having different sized radius.

  • In this design the data gossiped is only a trie diff but it wouldn't include the state required to initialize the sub tries. Any thoughts on how clients will get the state to initialize their subtrie/s?

  • How do we handle exclusion proofs? When responding to a lookup request, I guess clients can walk down their sub trie to determine if they have an exclusion proof. Even if they don't have the content requested they should respond with an exclusion proof (if they have one) otherwise indicate that they don't have the content.

  • How do we handle reorgs? The TrieDiffs stored in the network will need to be deleted and recreated somehow in the event of a reorg. But I assume clients would need the before part of the trie diffs to roll back to the finalized block so we wouldn't want to delete the trie diffs too quickly.

  • We probably should have flat storage for contract bytecode as well because without it we would need to look up the code using the existing state network but this is problematic for a few reasons.

    • Our current plan is that we only support archive trie state up to finalized blocks so the code won't exist if it was deployed recently within the last 256 blocks.
    • Looking up the contract code by codeHash means we need to do at least 2 lookups, one to get the account which holds the codeHash and then another one to get the code. It would be nice to lookup the code by addressHash directly so that only one lookup is required. This would help improve the performance of eth_call for example.

Co-authored-by: Piper Merriam <[email protected]>
@morph-dev
Copy link
Contributor Author

Why is this being proposed as a separate network

I would say that the main reason is what Piper said as well. This is very experimental and I would rather have separate network that is unstable. Once it becomes more stable, we can consider merging two networks into one. We could specify that if one is using new network, they should also participate in old network as well (I didn't document it yet, but using old state network is probably the easiest way to bootstrap fresh node).

On top of that, there are many concepts that behave very differently, for example: the interpretation of radius and how it's used, distance function (we might have to switch it to something other than XOR).

@morph-dev
Copy link
Contributor Author

Replies to @bhartnett

  • I think we should use stateRoot instead of blockHash in the data types to reduce coupling with the history network and block headers. This will make implementations that pass in the stateRoot directly possible without having to look up the block header.

We still need blockHash in order to verify content (because using stateRoot to identify block doesn't sound like a good idea). And if we need blockHash, it is redundant to have stateRoot. Or am I missing something?

  • The naming of Account Trie Node and Contract Trie Node types is confusing because we don't just look up a single trie node for these content types, but rather a proof is returned. They also clash with the names in the current state network. Perhaps these should be named Account Trie Proof and Contract Trie Proof or something else.

I'm not set on exact naming. Happy to brainstorm better names.

  • I wonder if the requirement for clients to store an entire subtrie at depth 2 is too large. If I understand correctly, this would approximately be the size of the entire state trie divided by 256? Might be too large for small/lightweight clients keeping in mind that the state will also grow over time. If we think this might be a problem we could consider designs where the size of the subtrees are not fixed to accommodate nodes having different sized radius.

Probably I didn't specify it clearly enough, but it isn't a plan for every node to store entire depth 2 subtrie.
Each node can decide how big or small their subtrie they want it to be. It's just that we use trie-diff at depth 2 for gossip purposes. Each node would use only part of that subtrie-diff that overlaps with their own subtrie.

Hope I was able to explain this. Will think how I can make spec clearer.

  • In this design the data gossiped is only a trie diff but it wouldn't include the state required to initialize the sub tries. Any thoughts on how clients will get the state to initialize their subtrie/s?

The simplest way is to use current state network for syncing their desired subtrie. Once that is synched, they use trie-diffs to catch up to head of the network.

  • How do we handle exclusion proofs? When responding to a lookup request, I guess clients can walk down their sub trie to determine if they have an exclusion proof. Even if they don't have the content requested they should respond with an exclusion proof (if they have one) otherwise indicate that they don't have the content.

Correct. If node has an entire view of their subtrie, they should be able to create exclusion proof for that subtrie.

  • How do we handle reorgs? The TrieDiffs stored in the network will need to be deleted and recreated somehow in the event of a reorg. But I assume clients would need the before part of the trie diffs to roll back to the finalized block so we wouldn't want to delete the trie diffs too quickly.

Exact details to be figured out when we start implementing things. But yes, idea is to keep TrieDiffs from reorgs around, at least until chain is finalized.

  • We probably should have flat storage for contract bytecode as well because without it we would need to look up the code using the existing state network but this is problematic for a few reasons.

Yes, I think we should support bytecode as well. I think I even dedicated content key/value for it, but didn't specify it as it's quite simple and I didn't have time.
But importantly, we should first start with Account Trie and see how well this works. There are more complications when it comes to Smart Contract Storage tries, so I left most of the SmartContract logic loosely or not specified.

@bhartnett
Copy link
Contributor

bhartnett commented Apr 10, 2025

We still need blockHash in order to verify content (because using stateRoot to identify block doesn't sound like a good idea). And if we need blockHash, it is redundant to have stateRoot. Or am I missing something?

Well what I meant is that we could use stateRoot instead of blockHash in the content types because there is exactly one stateRoot for every block. StateRoot would work for the Account Trie Node lookup because for example when you do a eth_getBalance lookup by block number we lookup the block header and pull out the state root which can then be used to fetch the account data. Using stateRoot in this case is nice because we can also create another debug_getBalanceByStateRoot endpoint that doesn't require the history network to be enabled because it can look up the content by state root directly. This is just an example to show the point that it could help a bit to make this subnetwork more loosely coupled from the history network.

Now having said that, I understand that for the Account Trie Diff type we need the block hash so that when validating the content we can look up the block header and check that the content matches and is anchored to the stateRoot in the header but as an alternative idea if we stored the last 256 stateRoots in memory (collected from the history network in the background) then we could use stateRoot here as well. This loose coupling also makes testing easier.

Probably I didn't specify it clearly enough, but it isn't a plan for every node to store entire depth 2 subtrie. Each node can decide how big or small their subtrie they want it to be. It's just that we use trie-diff at depth 2 for gossip purposes. Each node would use only part of that subtrie-diff that overlaps with their own subtrie.

Hope I was able to explain this. Will think how I can make spec clearer.

I understand what you mean. This sounds reasonable to me and I think it could work. There is still the risk that the size of the subtrie diffs would grow over time as Ethereum's gas limit increases etc but that is likely a problem that would be solved/improved by verkle/binary tries which should reduce the proof sizes.

The simplest way is to use current state network for syncing their desired subtrie. Once that is synched, they use trie-diffs to catch up to head of the network.

Makes sense. Something to point out is that syncing a sub-trie might take a decent amount of time and during the process the node might fall behind the finalized block (which is moving forward every 12 seconds) and so it would need to have access to the trie diffs from a bit before the latest finalized block.

@morph-dev
Copy link
Contributor Author

Regarding block_hash vs state_root, I don't have strong preference. I think state_root is more likely to be fragile and hard to handle/recover if something is not working. But we can give it a try and keep it if everything works well with it.

Makes sense. Something to point out is that syncing a sub-trie might take a decent amount of time and during the process the node might fall behind the finalized block (which is moving forward every 12 seconds) and so it would need to have access to the trie diffs from a bit before the latest finalized block.

It can accept, verify and store relevant trie-diffs locally while subtrie is synching. This way, syncing can last much longer.
Once syncing is done, it can quickly apply triediffs.

Alternatively, it can sync smaller subtries separately and merge them. Would require more work and research to figure out the best way to do it, but it sounds doable.

Ultimately, big area of research on how to do this and many other things. Current spec is high level view and direction that we think is the best at the moment.

@bhartnett
Copy link
Contributor

It can accept, verify and store relevant trie-diffs locally while subtrie is synching. This way, syncing can last much longer. Once syncing is done, it can quickly apply triediffs.

Yes this is a good idea.

Alternatively, it can sync smaller subtries separately and merge them. Would require more work and research to figure out the best way to do it, but it sounds doable.

For this I was thinking we would walk down the sub trie from the starting state root and do breadth first search so that some of the look ups could be requested concurrently to speed up the process.

Ultimately, big area of research on how to do this and many other things. Current spec is high level view and direction that we think is the best at the moment.

Sure, sounds good.

@pipermerriam
Copy link
Member

I think that we've found in the past that blockHash is preferred over stateRoot simply because reverse lookups for state roots aren't well supported, but block hash is widely supported, and most of the time, you really want to be able to do a reverse lookup to know what block something is referencing.

associated block. The present part of both partial views should match (and the same for missing
part). The trie nodes are ordered as if they are visited using
[pre-order Depth First Search](https://en.wikipedia.org/wiki/Tree_traversal#Pre-order,_NLR)
traversal algorithm.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morph-dev Is there a specific reason why pre-order Depth First Search was selected as the ordering of the trie nodes in the trie diff trie node lists?

I'm wondering if using Breadth First Search ordering of the trie nodes would be better because it might lead to simpler algorithms that can be defined recursively and are easier to reason about. For example when validating the trie nodes in the trie diff in place (without using additional memory/storage) we need to walk down each of the paths in the trie and check that the hashes of the child nodes match the hashes in the parent nodes. With a breadth first ordering we can define a recursive algorithm that takes a parent node, looks at how many children it should have and then check the hashes of the next n child nodes in the trie node list.

I'm sure it can still be done using pre-order Depth First Search but I'm thinking that Breadth First Search ordering might be easier to understand because the trie nodes are organised in layers based on trie depth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no strong reason for picking DFS over BFS, but I think that I disagree that BFS would be easier to reason about (especially if you plan to use recursive algorithm), because:

  • not all trie nodes will be present
  • usage of extension nodes in MPT gives wrong idea of layers, making it harder to reason about (in my opinion)

All in all, I don't have strong preferences, but my intuition is that DFS is simpler. But if most people prefer one, I don't mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion either way so I don't mind sticking with DFS if that is preferred.

Yes true, not all trie nodes will be present because it is just a diff not the full state subtree. This makes me wonder, for each parent node in a trie node list how can we know which child nodes are encoded in the list? It would be useful to know this.

I guess we could use a trial and error method where we hash the child nodes and see which of them match the hashes in the parent nodes (not sure how well this would work). But perhaps it would be better to encode in the data structure this information somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I envisioned is that:

  • you iterate both pre-block and post-block lists of trie nodes
    • calculate their hashes
    • put them in two separate maps, together with their index
  • you traverse both tries in parallel (starting from the root), and you keep track that indices are in consecutive order
    • if hash of the next trie node in both tries is the same, you skip it
    • othewise you look it up in maps you created earlier (they should be present), and continue traversal

There is probably some tricky logic with dealing with extension nodes and branches, and/or when some subtrie is not missing, I think this approach can be adjusted for that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only skimmed this conversation, but I believe Depth-First is the correct ordering here. My previous experience working with MPT proofs suggests that efficient algorithms can generally be written for either ordering, but that Depth-First is the more natural way to do this.

```py
def interested(node, content):
bits = content.id.length
return distance(node.id[:bits], content.id) <= node.radius[:bits]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the trie diff data type which has a path of len = 8, if a node has a radius where radius mod 256 == 255 then node.radius[:bits] will equal 0xFF and this will always be greater than or equal to distance(node.id[:bits], content.id) and therefore the node would be interested in all trie diffs not just a subset.

I guess a similar problem can happen for the other data types that have a longer path but with a lower probability of occurrence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that type of distance and radius is U256 (unsigned 256 bit integer). When represented as bytes, I assumed big-endian ordering, so if bits=8, then node.radius[:bits] represent the highest byte of the radius (equivalent to radius / 2^248).

Is that what you understood as well? And is logic still faulty?
If it was only about understanding the syntax, I'm accepting suggestions to make it clearer :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I miss understood the python slice syntax here, I read it as the last 8 bits but it is actually the first 8 bits of the radius. There is still a similar problem as described above but the reverse scenario. If nodes that have a radius where the first 8 bits are 0 (very common) then they will store almost no content (only content with a distance of zero)

Perhaps we should change the distance function to something like this:

func interested(node, content):
    bits = content.id.length
    return logdistance(node.id[:bits], content.id) <= log2(node.radius)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nodes that have a radius where the first 8 bits are 0 (very common) then they will store almost no content (only content with a distance of zero)

That's why I use <= in the condition. Only 1 of the 256 subtrie-diffs will share path with the node.id, and only that one will have distance(node.id[:bits], content.id) equal to 0 as well (and function would return true).

I think that logdistance would work as well (at least in the case of the gossip), as they should be pretty much the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I use <= in the condition. Only 1 of the 256 subtrie-diffs will share path with the node.id, and only that one will have distance(node.id[:bits], content.id) equal to 0 as well (and function would return true).

I see. Well given that scenario, if a node sets it's storage capacity to zero and therefore sets it's radius to zero, then it would still store one of the subtrie-diffs is that right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, that's interesting. We have the current expectation of being able to set the storage to 0 and accept nothing in history. It would be surprising to me to set storage to 0 and accept/store content anyway.

Copy link
Contributor Author

@morph-dev morph-dev Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, that's interesting. We have the current expectation of being able to set the storage to 0 and accept nothing in history. It would be surprising to me to set storage to 0 and accept/store content anyway.

Well, yes and no. According to spec (history, state), we define following:

interested(node, content) = distance(node.id, content.id) <= node.data_radius

So even if radius is zero, node should store content that has content.id that is exactly the same as its node.id. Of course, in existing networks, this wouldn't happen in practice.


With that being said, I'm also fine having some rules that lossen the requirements for storing, for example one of these would work:

  • if your radius is zero, you don't have to store anything
  • if there are no leafs that fall within your radius, no need to store inner nodes either
    • note that if there is at least one leaf within radius, nodes should store all inner nodes within radius in order to have exclustion proofs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that nodes that have a zero radius should be allowed to store nothing even with the existing distance function but why not change the distance function to use < instead of <= ?

For example:
interested(node, content) = distance(node.id, content.id) < node.data_radius

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Than the node that has the maximum radius (0xFFFF..FF) shouldn't store content that is farthest away (e.g. node.id=0x3333..33 and content.id=CCCC..CC). So spec is not very clean...

Also, comparing trie path of the subtrie (or any inner node for that matter) works much better for <= (see my messages earlier in this thread).

The network stores execution layer state content, which encompases the following data from the
latest 256 blocks:

- Block trie-diffs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to get validation of this concept done since it's a core building block of the network designs. We need test data for:

  • A block level diff that is correct (minimal and complete)
  • A block level diff that contains extra data and the previous block level proof that allows detection of this.
  • A block level diff that omits a change to the state and the previous block level proof that allows detection of this.

Comment on lines +70 to +80
The content id is derived only from content's trie path (and contract's address in the case of the
contracts's trie node). Its primary use case is in determining which nodes on the network should
store the content.

The content id has following properties are:

- It has variable length, between 0 and 256 bits (only multiplies of 4), representing trie path
- It's not unique, meaning different content will have the same content id
- The trie path of the contract's storage trie node is modified using contract's address

The derivation function is slightly different for different types and is defined below.
Copy link
Member

@pipermerriam pipermerriam Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not understanding the motivation/need/justification for a variable length content-id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that they would have to be 32 bytes. And you would generate them using my approach plus appending zeroes, right?

The problem is that you can't distinguish between root node, and first node of the first level (path: 0), and first node on the second level (path: 00), as they would all have content_id: 0x000000..00

You can make it 34 bytes long and include (e.g. at start) the length. But in that case you might just as well make them different length.

I could be mistaken and there is some way around it.


```py
selector = 0x30
account_trie_diff = Container(path: u8, block_hash: Bytes32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does path make more sense as a Bytes2 type? Effectively interchangeable with u8 but maybe more intuitive as a trie path as a bytes type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean as Bytes1? Or you mean Nibbles (that will in practice have length two, but type is variable length)?
Or you actually mean Bytes2 where each byte has value [0,15]?

I don't have strong preference. We already bundle 2 nibbles into one byte as part of the Nibbles type, so I though I would just assume that two nibbles are bundled into one byte.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant Bytes1

@carver
Copy link
Contributor

carver commented Apr 25, 2025

Alternatively, it can sync smaller subtries separately and merge them. Would require more work and research to figure out the best way to do it, but it sounds doable.

For this I was thinking we would walk down the sub trie from the starting state root and do breadth first search so that some of the look ups could be requested concurrently to speed up the process.

I think we benefit from being less focused on how long until the sync completes at the full target radius, and more focused on how long until the client first reaches a helpful state. A helpful state is one that can serve any state data to a peer (especially the data nearest your Node ID).

My first read is that the parallel breadth first search makes the opposite tradeoff: it speeds up the final sync time, at the cost of increasing the latency until the client reaches a helpful state.

Reaching that helpful state quickly might look like: starting with a tiny sync radius for walking the trie. With a small enough sync radius, walking the trie can happen in a minutes, maybe even seconds. Then you can incrementally increase the syncing radius, and be serving more and more content (the stuff nearest your Node ID) all along the way.

I suppose this is not something that should be defined as a spec requirement. We want to encourage client experimentation. But if we have a mild consensus on it, it seems like a good recommendation to include for clients that are not opinionated about the choice.

@bhartnett
Copy link
Contributor

For this I was thinking we would walk down the sub trie from the starting state root and do breadth first search so that some of the look ups could be requested concurrently to speed up the process.

I think we benefit from being less focused on how long until the sync completes at the full target radius, and more focused on how long until the client first reaches a helpful state. A helpful state is one that can serve any state data to a peer (especially the data nearest your Node ID).

I agree with this as a general goal but in this case we would be syncing the state from the existing trie node state network and so the state is already available for the block from which we start syncing and so other nodes don't benefit much from having the flat state here. Of course there is a small performance improvement having the flat state but that is all.

My first read is that the parallel breadth first search makes the opposite tradeoff: it speeds up the final sync time, at the cost of increasing the latency until the client reaches a helpful state.

Yes I think it would improve the overall speed of the sync because you maximize the amount of concurrent network look ups.

Reaching that helpful state quickly might look like: starting with a tiny sync radius for walking the trie. With a small enough sync radius, walking the trie can happen in a minutes, maybe even seconds. Then you can incrementally increase the syncing radius, and be serving more and more content (the stuff nearest your Node ID) all along the way.

I think dynamically changing the radius for this flat state model would be rather complex because you would need to re-sync the missing state from the last finalized block and then apply the trie diffs forward to get back to the head. I would think that supporting changes to the radius size should be something that is implemented later down the track. I could be missing something but that is my understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants