rpc: cache runtime version by code hash in CodeProvider#11794
rpc: cache runtime version by code hash in CodeProvider#11794
CodeProvider#11794Conversation
Every call to on_chain_runtime_version() goes through the executor's RuntimeCache infrastructure: acquiring its mutex, performing an LRU lookup, reserving an instance pool slot, and constructing OverlayedChanges and Ext objects — just to retrieve the already-known RuntimeVersion. Add a code_hash → RuntimeVersion cache to CodeProvider so that on_chain_runtime_version() short-circuits all of that on repeat calls with a single HashMap lookup.
4ba5b7c to
7de5d15
Compare
|
/cmd prdoc --audience node_dev --bump patch |
|
/cmd fmt |
bkchr
left a comment
There was a problem hiding this comment.
Not against the pr, but would be nice to get some more context.
| wasm_substitutes: WasmSubstitutes<Block, Executor, Backend>, | ||
| /// Avoids rereading the full `:code` blob from the trie just to determine the runtime | ||
| /// version. Unbounded, but safe: grows by one entry per runtime upgrade, which don't happen | ||
| /// often |
There was a problem hiding this comment.
Claude is not right here. We are not reading the :code to get the version, as long as the instance is still cached. Internally we also just return the cached version. I would also argue/assume that any one trying to get the version will do some runtime call. Or do you have seen that RPC users first send a runtime version request?
Updated the description with more details. Second tab describes the test results. They show some improvement with runtime version cache implemented. |
Maybe I misinterpret your figures or it isn't faster with this extra cache? |
Yes, it is slightly slower in sequential mode. But way faster in parallel mode. |
Description
It was observed that in Polkadot Assethub some RPC calls take much time (>1s)
The problem is particularly visible on the nodes that are also serving RPC for
eth-rpcarchive nodes, which are querying archived states.The problem is particularly visible with
state_call, which is used when serving some runtime methods, eg.ReviveApi_balance.Further analysis of the
state_callrevealed that the bottleneck is Get the runtime code step, where the whole WASM blob is read to get the runtime version.Implementation
Every call to
CodeProvider::on_chain_runtime_version()goes through the executor'sRuntimeCacheinfrastructure -acquiring its mutex, performing an LRU lookup, reserving an instance pool slot, and constructingOverlayedChanges/Extobjects - just to retrieve the already-knownRuntimeVersion.This PR adds a
code_hash → RuntimeVersioncache toCodeProviderso thaton_chain_runtime_version()short-circuits all of that on repeat calls with a singleHashMaplookup behind aRwLock(concurrent reads on the hot path).The cache is unbounded but safe: it grows by one entry per runtime upgrade, which are not that often.
Changes
RwLock<HashMap<Vec<u8>, RuntimeVersion>>cache toCodeProvider