Skip to content

rpc: cache runtime version by code hash in CodeProvider#11794

Open
lrubasze wants to merge 6 commits intomasterfrom
lrubasze-runtime-version-cache
Open

rpc: cache runtime version by code hash in CodeProvider#11794
lrubasze wants to merge 6 commits intomasterfrom
lrubasze-runtime-version-cache

Conversation

@lrubasze
Copy link
Copy Markdown
Contributor

@lrubasze lrubasze commented Apr 16, 2026

Description

It was observed that in Polkadot Assethub some RPC calls take much time (>1s)

The problem is particularly visible on the nodes that are also serving RPC for eth-rpc archive nodes, which are querying archived states.
The problem is particularly visible with state_call, which is used when serving some runtime methods, eg. ReviveApi_balance.

Further analysis of the state_call revealed that the bottleneck is Get the runtime code step, where the whole WASM blob is read to get the runtime version.

Implementation

Every call to CodeProvider::on_chain_runtime_version() goes through the executor's RuntimeCache infrastructure -acquiring its mutex, performing an LRU lookup, reserving an instance pool slot, and constructing OverlayedChanges/Ext objects - just to retrieve the already-known RuntimeVersion.

This PR adds a code_hash → RuntimeVersion cache to CodeProvider so that on_chain_runtime_version() short-circuits all of that on repeat calls with a single HashMap lookup behind a RwLock (concurrent reads on the hot path).

The cache is unbounded but safe: it grows by one entry per runtime upgrade, which are not that often.

Changes

  • Add RwLock<HashMap<Vec<u8>, RuntimeVersion>> cache to CodeProvider
  • Cache hit returns cloned version via read lock; miss computes and inserts via write lock
  • Add tests verifying cache population, reuse, and per-hash differentiation

Every call to on_chain_runtime_version() goes through the executor's
RuntimeCache infrastructure: acquiring its mutex, performing an LRU
lookup, reserving an instance pool slot, and constructing OverlayedChanges
and Ext objects — just to retrieve the already-known RuntimeVersion.

Add a code_hash → RuntimeVersion cache to CodeProvider so that
on_chain_runtime_version() short-circuits all of that on repeat calls
with a single HashMap lookup.
@lrubasze lrubasze added T0-node This PR/Issue is related to the topic “node”. T3-RPC_API This PR/Issue is related to RPC APIs. labels Apr 16, 2026
@lrubasze lrubasze force-pushed the lrubasze-runtime-version-cache branch from 4ba5b7c to 7de5d15 Compare April 16, 2026 10:15
@lrubasze
Copy link
Copy Markdown
Contributor Author

/cmd prdoc --audience node_dev --bump patch

@lrubasze
Copy link
Copy Markdown
Contributor Author

/cmd fmt

Copy link
Copy Markdown
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not against the pr, but would be nice to get some more context.

wasm_substitutes: WasmSubstitutes<Block, Executor, Backend>,
/// Avoids rereading the full `:code` blob from the trie just to determine the runtime
/// version. Unbounded, but safe: grows by one entry per runtime upgrade, which don't happen
/// often
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude is not right here. We are not reading the :code to get the version, as long as the instance is still cached. Internally we also just return the cached version. I would also argue/assume that any one trying to get the version will do some runtime call. Or do you have seen that RPC users first send a runtime version request?

@lrubasze
Copy link
Copy Markdown
Contributor Author

lrubasze commented Apr 16, 2026

Not against the pr, but would be nice to get some more context.

Updated the description with more details.
For even more details check this: https://docs.google.com/document/d/1LHfcwOPIbEboOprCG4O_Nk3UCW3sKHGrnz4wdNpyUxw/edit?tab=t.0

Second tab describes the test results. They show some improvement with runtime version cache implemented.
But taking into account your comment I want to analyse the results again.
Maybe I am missing something

@bkchr
Copy link
Copy Markdown
Member

bkchr commented Apr 16, 2026

For even more details check this: https://docs.google.com/document/d/1LHfcwOPIbEboOprCG4O_Nk3UCW3sKHGrnz4wdNpyUxw/edit?tab=t.0

Maybe I misinterpret your figures or it isn't faster with this extra cache?

@lrubasze
Copy link
Copy Markdown
Contributor Author

For even more details check this: https://docs.google.com/document/d/1LHfcwOPIbEboOprCG4O_Nk3UCW3sKHGrnz4wdNpyUxw/edit?tab=t.0

Maybe I misinterpret your figures or it isn't faster with this extra cache?

Yes, it is slightly slower in sequential mode. But way faster in parallel mode.
But let me revisit this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T0-node This PR/Issue is related to the topic “node”. T3-RPC_API This PR/Issue is related to RPC APIs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants