Skip to content

Add first version of the spec for the Head-MPT State network #389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion portal-wire-protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ All protocol identifiers consist of two bytes. The first byte is "`P`" (`0x50`),

Currently defined mainnet protocol identifiers:

- Inclusive range of `0x5000` - `0x5009`: Reserved for future networks or network upgrades
- Inclusive range of `0x5000` - `0x5008`: Reserved for future networks or network upgrades
- `0x5009`: Execution Head-MPT State Network
- `0x500A`: Execution State Network
- `0x500B`: Execution History Network
- `0x500C`: Beacon Chain Network
Expand All @@ -42,6 +43,7 @@ Currently defined mainnet protocol identifiers:

Currently defined `angelfood` protocol identifiers:

- `0x5049`: Execution Head-MPT State Network
- `0x504A`: Execution State Network
- `0x504B`: Execution History Network
- `0x504C`: Beacon Chain Network
Expand Down
267 changes: 267 additions & 0 deletions state/head-mpt-state-network.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,267 @@
# Execution Head-MPT State Network

| 🚧 THE SPEC IS IN A STATE OF FLUX AND SHOULD BE CONSIDERED UNSTABLE 🚧 <br> _Clients should implement Account Trie first and reevaluate poposed approach for Storage Tries_ |
|-|

This document is the specification for the Portal Network that supports on-demand availability of
close to the head of the chain, Merkle Patricia Trie State data from the execution chain.

While similar to the [Execution State network](state-network.md), it has some unique features:

- It supports direct lookup of the account state, contract's code and contract's storage
- It only stores state that is present in any of the latest 256 blocks
- Nodes are responsible for storing subtree of the entire state trie

## Overview

The Execution Head-MPT State Network is a
[Kademlia](https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf) DHT that uses the
[Portal Wire Protocol](../portal-wire-protocol.md) to establish an overlay network on top of the
[Discovery v5](https://github.com/ethereum/devp2p/blob/master/discv5/discv5-wire.md) protocol.

Nodes are responsible for storing fixed state subtree, across all 256 recent blocks.

Nodes are expected to have access to the latest 256 block headers, which they will use to validate
content and handle re-orgs. The [History](../history/history-network.md) and
[Beacon](../beacon-chain/beacon-network.md) networks can be used for this purpose, but
implementations can use other out-of-protocol solutions as well.

Content is gossiped as block's trie-diff subtries. This provides very efficient way to keep node's
subtrie updated as the chain progresses.

### Data

The network stores execution layer state content, which encompases the following data from the
latest 256 blocks:

- Block trie-diffs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to get validation of this concept done since it's a core building block of the network designs. We need test data for:

  • A block level diff that is correct (minimal and complete)
  • A block level diff that contains extra data and the previous block level proof that allows detection of this.
  • A block level diff that omits a change to the state and the previous block level proof that allows detection of this.

- Account trie
- All contract bytecode (planned but not implemented)
- All contract storage tries (planned but not implemented)

#### Types

Available content types are:

- Block trie-diff, identifiable by block hash and subtrie path
- Account trie nodes, identifiable by block hash and account trie path
- Contract's bytecode, identifiable by block hash and account's address
- Contract's storage trie, identifiable by block hash, contract's address, and storate trie path

#### Retrieval

Every content type is retrievable, using its identifiers. This means that account state (balance,
nonce, bytecode, state root) and contract's storage is retrievable using single content lookup.

## Specification

### Distance Function

The network uses standard XOR distance metric, defined in the
[portal wire protocol](../portal-wire-protocol.md#xor-distance-function) specification.

The only difference is that arguments can have less than 256 bits, but both arguments must have the
same length.

### Content ID Derivation Function

> 🚧 **TODO**: Consider changing name to Content Path

The content id is derived only from content's trie path (and contract's address in the case of the
contracts's trie node). Its primary use case is in determining which nodes on the network should
store the content.

The content id has following properties are:

- It has variable length, between 0 and 256 bits (only multiplies of 4), representing trie path
- It's not unique, meaning different content will have the same content id
- The trie path of the contract's storage trie node is modified using contract's address

The derivation function is slightly different for different types and is defined below.
Comment on lines +70 to +80
Copy link
Member

@pipermerriam pipermerriam Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not understanding the motivation/need/justification for a variable length content-id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that they would have to be 32 bytes. And you would generate them using my approach plus appending zeroes, right?

The problem is that you can't distinguish between root node, and first node of the first level (path: 0), and first node on the second level (path: 00), as they would all have content_id: 0x000000..00

You can make it 34 bytes long and include (e.g. at start) the length. But in that case you might just as well make them different length.

I could be mistaken and there is some way around it.


### Wire Protocol

#### Protocol Identifier

As specified in the [Protocol identifiers](../portal-wire-protocol.md#protocol-identifiers) section
of the Portal wire protocol, the `protocol` field in the `TALKREQ` message **MUST** contain the
value of `0x5009`.

#### Supported Message Types

The network supports the following protocol messages:

- `Ping` - `Pong`
- `Find Nodes` - `Nodes`
- `Find Content` - `Found Content`
- `Offer` - `Accept`

#### `Ping.payload` & `Pong.payload`

The pyload type of the first `Ping` message between nodes MUST be
[Type 0: Client Info, Radius, and Capabilities Payload](../ping-extensions/extensions/type-0.md).
Nodes then upgrade to the latest payload type supported by both of the clients.

List of currently supported payloads, by latest to oldest.
- [Type 1 Basic Radius Payload](../ping-extensions/extensions/type-1.md)

### Routing Table

The network uses the standard routing table structure from the Portal Wire Protocol.

### Node State

#### Data Radius

The network includes one additional piece of node state that should be tracked: `data_radius`. This
value is a 256 bit integer and represents the data that a node is "interested" in. The value may
fluctuate as the contents of local key-value store changes.

> 🚧 **TODO**: consider making radius power of two

A node should track their own radius value and provide this value in all `Ping` or `Pong`
messages it sends to other nodes. A node is expected to maintain `data_radius` information for each
node in its local routing table.

We define the following function to determine whether node in the network should be interested in a
piece of content.

```py
def interested(node, content):
bits = content.id.length
return distance(node.id[:bits], content.id) <= node.radius[:bits]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the trie diff data type which has a path of len = 8, if a node has a radius where radius mod 256 == 255 then node.radius[:bits] will equal 0xFF and this will always be greater than or equal to distance(node.id[:bits], content.id) and therefore the node would be interested in all trie diffs not just a subset.

I guess a similar problem can happen for the other data types that have a longer path but with a lower probability of occurrence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that type of distance and radius is U256 (unsigned 256 bit integer). When represented as bytes, I assumed big-endian ordering, so if bits=8, then node.radius[:bits] represent the highest byte of the radius (equivalent to radius / 2^248).

Is that what you understood as well? And is logic still faulty?
If it was only about understanding the syntax, I'm accepting suggestions to make it clearer :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I miss understood the python slice syntax here, I read it as the last 8 bits but it is actually the first 8 bits of the radius. There is still a similar problem as described above but the reverse scenario. If nodes that have a radius where the first 8 bits are 0 (very common) then they will store almost no content (only content with a distance of zero)

Perhaps we should change the distance function to something like this:

func interested(node, content):
    bits = content.id.length
    return logdistance(node.id[:bits], content.id) <= log2(node.radius)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nodes that have a radius where the first 8 bits are 0 (very common) then they will store almost no content (only content with a distance of zero)

That's why I use <= in the condition. Only 1 of the 256 subtrie-diffs will share path with the node.id, and only that one will have distance(node.id[:bits], content.id) equal to 0 as well (and function would return true).

I think that logdistance would work as well (at least in the case of the gossip), as they should be pretty much the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I use <= in the condition. Only 1 of the 256 subtrie-diffs will share path with the node.id, and only that one will have distance(node.id[:bits], content.id) equal to 0 as well (and function would return true).

I see. Well given that scenario, if a node sets it's storage capacity to zero and therefore sets it's radius to zero, then it would still store one of the subtrie-diffs is that right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, that's interesting. We have the current expectation of being able to set the storage to 0 and accept nothing in history. It would be surprising to me to set storage to 0 and accept/store content anyway.

Copy link
Contributor Author

@morph-dev morph-dev Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, that's interesting. We have the current expectation of being able to set the storage to 0 and accept nothing in history. It would be surprising to me to set storage to 0 and accept/store content anyway.

Well, yes and no. According to spec (history, state), we define following:

interested(node, content) = distance(node.id, content.id) <= node.data_radius

So even if radius is zero, node should store content that has content.id that is exactly the same as its node.id. Of course, in existing networks, this wouldn't happen in practice.


With that being said, I'm also fine having some rules that lossen the requirements for storing, for example one of these would work:

  • if your radius is zero, you don't have to store anything
  • if there are no leafs that fall within your radius, no need to store inner nodes either
    • note that if there is at least one leaf within radius, nodes should store all inner nodes within radius in order to have exclustion proofs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that nodes that have a zero radius should be allowed to store nothing even with the existing distance function but why not change the distance function to use < instead of <= ?

For example:
interested(node, content) = distance(node.id, content.id) < node.data_radius

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Than the node that has the maximum radius (0xFFFF..FF) shouldn't store content that is farthest away (e.g. node.id=0x3333..33 and content.id=CCCC..CC). So spec is not very clean...

Also, comparing trie path of the subtrie (or any inner node for that matter) works much better for <= (see my messages earlier in this thread).

```

> 🚧 **TODO**: revisit if this is correct for non-leaf nodes

> 🚧 **TODO**: maybe adjust for contract's storage trie

### Data Types

#### Helper Data Types

The helper types `Nibbles`, `AddressHash`, `TrieNode`, and `TrieProof` are defined the same way as
in [Execution State Network](state-network.md#helper-data-types).

##### TrieDiff

The Trie-Diff represents the minimal structure that represents how MPT changed from one block to
another. Observe that this is not enough in order to execute block (as data that is only read is not
present).

In order for node to verify that provided trie-diff is complete and minimal, we need both previous
and new value of every changed trie node.

```py
TrieNodeList = List[TrieNode, limit=65536]
TrieDiff = Container(before: TrieNodeList, after: TrieNodeList)
```

One should be able to construct two partial views of the Merkle Patricia Trie before and after
associated block. The present part of both partial views should match (and the same for missing
part). The trie nodes are ordered as if they are visited using
[pre-order Depth First Search](https://en.wikipedia.org/wiki/Tree_traversal#Pre-order,_NLR)
traversal algorithm.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morph-dev Is there a specific reason why pre-order Depth First Search was selected as the ordering of the trie nodes in the trie diff trie node lists?

I'm wondering if using Breadth First Search ordering of the trie nodes would be better because it might lead to simpler algorithms that can be defined recursively and are easier to reason about. For example when validating the trie nodes in the trie diff in place (without using additional memory/storage) we need to walk down each of the paths in the trie and check that the hashes of the child nodes match the hashes in the parent nodes. With a breadth first ordering we can define a recursive algorithm that takes a parent node, looks at how many children it should have and then check the hashes of the next n child nodes in the trie node list.

I'm sure it can still be done using pre-order Depth First Search but I'm thinking that Breadth First Search ordering might be easier to understand because the trie nodes are organised in layers based on trie depth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no strong reason for picking DFS over BFS, but I think that I disagree that BFS would be easier to reason about (especially if you plan to use recursive algorithm), because:

  • not all trie nodes will be present
  • usage of extension nodes in MPT gives wrong idea of layers, making it harder to reason about (in my opinion)

All in all, I don't have strong preferences, but my intuition is that DFS is simpler. But if most people prefer one, I don't mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion either way so I don't mind sticking with DFS if that is preferred.

Yes true, not all trie nodes will be present because it is just a diff not the full state subtree. This makes me wonder, for each parent node in a trie node list how can we know which child nodes are encoded in the list? It would be useful to know this.

I guess we could use a trial and error method where we hash the child nodes and see which of them match the hashes in the parent nodes (not sure how well this would work). But perhaps it would be better to encode in the data structure this information somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I envisioned is that:

  • you iterate both pre-block and post-block lists of trie nodes
    • calculate their hashes
    • put them in two separate maps, together with their index
  • you traverse both tries in parallel (starting from the root), and you keep track that indices are in consecutive order
    • if hash of the next trie node in both tries is the same, you skip it
    • othewise you look it up in maps you created earlier (they should be present), and continue traversal

There is probably some tricky logic with dealing with extension nodes and branches, and/or when some subtrie is not missing, I think this approach can be adjusted for that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only skimmed this conversation, but I believe Depth-First is the correct ordering here. My previous experience working with MPT proofs suggests that efficient algorithms can generally be written for either ordering, but that Depth-First is the more natural way to do this.


The actual usage on this type is slightly different. We use subtrie (at depth 2) of the whole
Trie-Diff. To acocomodate this, we don't have to change the type. Instead, it's enough to allow
first two layers to omit parts that are different (except the subtrie that is specified by content
key).

#### Account Trie-Diff

This data type represent a subtrie of block's Trie-Diff. The entire Trie-Diff is split into
subtries at depth 2 (2 nibbles or 8 bits). This was chosen arbitrary
as a good estimate for not making subtrie-diffs too small or too big.

```py
selector = 0x30
account_trie_diff = Container(path: u8, block_hash: Bytes32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does path make more sense as a Bytes2 type? Effectively interchangeable with u8 but maybe more intuitive as a trie path as a bytes type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean as Bytes1? Or you mean Nibbles (that will in practice have length two, but type is variable length)?
Or you actually mean Bytes2 where each byte has value [0,15]?

I don't have strong preference. We already bundle 2 nibbles into one byte as part of the Nibbles type, so I though I would just assume that two nibbles are bundled into one byte.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant Bytes1


content_key = selector + SSZ.serialize(account_trie_diff)
content_value = Container(subtrie_diff: TrieDiff)

def content_id(account_trie_diff):
return account_trie_diff.path
```

The `subtrie_diff` field of the content value includes first 2 layers of the trie as well (as first
two elements).

#### Account Trie Node

This data type represent a node from the account trie.

```py
selector = 0x31
account_trie_node = Container(path: Nibbles, block_hash: Bytes32)

content_key = selector + SSZ.serialize(account_trie_node)
content_value = Container(proof: TrieProof)

def content_id(account_trie_node):
return account_trie_node.path
```

The last trie node in the `proof` MUST correspond to the trie path from the content key.

#### Contract Trie-Diff

This data type represent a subtrie of contract's Trie-Diff at the specific block. The entire
Trie-Diff is split into subtries at depth 2. This was chosen arbitrary as a good estimate for not
making subtrie-diffs too small or too big.

```py
selector = 0x32
contract_trie_diff = Container(path: u8, address_hash: AddressHash, block_hash: Bytes32)

content_key = selector + SSZ.serialize(contract_trie_diff)
content_value = Container(subtrie_diff: TrieDiff, account_proof: TrieProof)

def content_id(contract_trie_diff):
return contract_trie_node.path XOR contract_trie_diff.address_hash[:2]
```

#### Contract Trie Node

This data type represent a node from the contracts's storage trie.

```py
selector = 0x33
contract_trie_node = Container(path: Nibbles, address_hash: AddressHash, block_hash: Bytes32)

content_key = selector + SSZ.serialize(contract_trie_node)
content_value = Container(storage_proof: TrieProof, account_proof: TrieProof)

def content_id(contract_trie_node):
bits = contract_trie_node.path.length
return contract_trie_node.path XOR contract_trie_node.address_hash[:bits]
```

The last trie node in the `storage_proof` MUST correspond to the trie path from the content key,
inside contract's storage trie.

#### Contract Bytecode

> 🚧 **TODO**: Evaluate if needed

> 🚧 **TODO**: Write spec (should be similar to [Execution State Network](state-network.md)).

### Algorithms

#### Gossip

Only the Trie-Diffs will be gossiped between nodes. This is done as efficient mechanism for nodes
to keep up-to-date with the chain, as they can easily update the subtrie they are responsible for.

#### Storage layout

Clients MUST store entire subtree that is "close" to their node.id. They MUST also keep track of
all trie nodes from the root of the trie to the root of their respective subtree.

One way of storing data is to combine trie nodes that correspond to the same trie path, and keep
track of the latest version and series of reversed diffs (block number at which trie node changed,
and its previous value)

Trie Diff content type can be used for determining which trie nodes contain reverse diff that go
out of "most recent 256" window and can be purged.