Skip to content

Dispatch snap server request processing off Netty event loop#10083

Merged
macfarla merged 14 commits intobesu-eth:mainfrom
macfarla:dispatch-off-netty-only
Mar 25, 2026
Merged

Dispatch snap server request processing off Netty event loop#10083
macfarla merged 14 commits intobesu-eth:mainfrom
macfarla:dispatch-off-netty-only

Conversation

@macfarla
Copy link
Copy Markdown
Contributor

@macfarla macfarla commented Mar 23, 2026

SnapProtocolManager.processMessage() was calling snapMessages.dispatch() synchronously on the Netty event loop thread for incoming GET_* snap requests. These handlers do heavy work (trie traversal, proof generation, DB reads) that can take several seconds, blocking the event loop and preventing the server from responding to concurrent ETH protocol messages such as GET_BLOCK_HEADERS.

When testing with besu server <> besu client, this caused the client to accumulate 5+ (eth,3) timeouts and disconnect the server with TIMEOUT.

Fix: for even-coded snap request messages (GET_ACCOUNT_RANGE, GET_STORAGE_RANGE, GET_BYTECODES, GET_TRIE_NODES), dispatch the heavy snapMessages.dispatch() work and the subsequent send() to EthScheduler's service thread pool. The Netty event loop now returns immediately after dispatching, staying free to handle ETH messages. Response messages (odd codes) are already handled by dispatchMessage() and need no changes.

Thanks for sending a pull request! Have you done the following?

  • Checked out our contribution guidelines?
  • Considered documentation and added the doc-change-required label to this PR if updates are required.
  • Considered the changelog and included an update if required.
  • For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

  • spotless: ./gradlew spotlessApply
  • unit tests: ./gradlew build
  • acceptance tests: ./gradlew acceptanceTest
  • integration tests: ./gradlew integrationTest
  • reference tests: ./gradlew ethereum:referenceTests:referenceTests

SnapProtocolManager.processMessage() was calling snapMessages.dispatch()
synchronously on the Netty event loop thread for incoming GET_* snap
requests. These handlers do heavy work (trie traversal, proof generation,
DB reads) that can take several seconds, blocking the event loop and
preventing the server from responding to concurrent ETH protocol messages
such as GET_BLOCK_HEADERS. This caused the client to accumulate 5+
(eth,3) timeouts and disconnect the server with TIMEOUT.

Fix: for even-coded snap request messages (GET_ACCOUNT_RANGE,
GET_STORAGE_RANGE, GET_BYTECODES, GET_TRIE_NODES), dispatch the
heavy snapMessages.dispatch() work and the subsequent send() to
EthScheduler's service thread pool. The Netty event loop now returns
immediately after dispatching, staying free to handle ETH messages.
Response messages (odd codes) are already handled by dispatchMessage()
and need no changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Copilot AI review requested due to automatic review settings March 23, 2026 03:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Dispatches inbound SNAP GET_* request handling off the Netty event loop to avoid blocking ETH protocol processing and timeouts.

Changes:

  • Add EthScheduler to SnapProtocolManager and wire it from BesuControllerBuilder.
  • Offload even-coded SNAP request processing (GET_*) to EthScheduler service threads, including response sending.
  • Adjust malformed SNAP message handling to ignore malformed request payloads instead of disconnecting.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
ethereum/eth/src/main/java/org/hyperledger/besu/ethereum/eth/manager/snap/SnapProtocolManager.java Offloads SNAP GET_* request dispatch + response send to service thread pool via EthScheduler.
app/src/main/java/org/hyperledger/besu/controller/BesuControllerBuilder.java Wires EthScheduler into SnapProtocolManager construction.

Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
macfarla and others added 2 commits March 25, 2026 07:24
Keep both EthScheduler (dispatch-off-netty) and ProtocolSchedule (SNAP2
capability detection) parameters in SnapProtocolManager constructor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
@pinges
Copy link
Copy Markdown
Contributor

pinges commented Mar 25, 2026

Code review

Found 2 issues:

  1. SnapV1.REQUEST_CODES only contains snap/1 request codes {0x00, 0x02, 0x04, 0x06}, but SnapProtocolManager also handles snap/2 via SNAP2 capability. PR EIP-8189: snap/2 #10064 (merged just before this PR) added SnapV2.GET_BLOCK_ACCESS_LISTS = 0x08. Since 0x08 is not in SnapV1.REQUEST_CODES, when a snap/2 peer sends a GET_BLOCK_ACCESS_LISTS request, scheduleSnapRequest is never called and the request is silently dropped -- breaking snap/2 server-side functionality entirely.

/** The set of inbound request message codes that the snap server must handle. */
public static final Set<Integer> REQUEST_CODES =
Set.of(GET_ACCOUNT_RANGE, GET_STORAGE_RANGE, GET_BYTECODES, GET_TRIE_NODES);

// GET_* requests are handled off the Netty event loop to avoid blocking ETH protocol traffic.
if (SnapV1.REQUEST_CODES.contains(code)) {
scheduleSnapRequest(ethPeer, decodedEthMessage, cap, code);
}

  1. scheduleServiceTask(Runnable) returns a CompletableFuture<Void> that is silently discarded. Unlike the EthTask overload, the Runnable overload does not add the future to pendingFutures, so: (a) any non-RLPException RuntimeException thrown inside the lambda is silently swallowed, and (b) during node shutdown, in-flight snap request tasks are not cancelled via pendingFutures and may continue executing against partially torn-down state. Consider using the EthTask overload, or at minimum attaching an exceptionally handler to log unexpected failures.

private void scheduleSnapRequest(
final EthPeer ethPeer,
final EthMessage decodedEthMessage,
final Capability cap,
final int code) {
ethScheduler.scheduleServiceTask(
() -> {
Optional<MessageData> maybeResponseData = Optional.empty();
try {
final Map.Entry<BigInteger, MessageData> requestIdAndEthMessage =
decodedEthMessage.getData().unwrapMessageData();
maybeResponseData =
snapMessages
.dispatch(new EthMessage(ethPeer, requestIdAndEthMessage.getValue()), cap)
.map(
responseData ->
responseData.wrapMessageData(requestIdAndEthMessage.getKey()));
} catch (final RLPException e) {
LOG.debug(
"Received malformed snap message code={} (BREACH_OF_PROTOCOL), disconnecting: {}",
code,
ethPeer,
e);
ethPeer.disconnect(DisconnectReason.BREACH_OF_PROTOCOL_MALFORMED_MESSAGE_RECEIVED);

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

macfarla and others added 3 commits March 25, 2026 16:20
- Add REQUEST_CODES to SnapV2 so GET_BLOCK_ACCESS_LISTS (0x08) requests
  from snap/2 peers are scheduled off the Netty event loop rather than
  silently dropped
- Attach .exceptionally() handler to the CompletableFuture returned by
  scheduleServiceTask to log unexpected non-cancellation failures that
  would otherwise be swallowed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Copy link
Copy Markdown
Contributor

@pinges pinges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please catch FramingException as well. See comment.

macfarla and others added 3 commits March 25, 2026 18:13
unwrapMessageData() can throw FramingException (decompression failure)
as well as RLPException. Catch both and disconnect the peer, consistent
with the pre-schedule FramingException handling in processMessage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Copy link
Copy Markdown
Contributor

@pinges pinges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@macfarla macfarla enabled auto-merge (squash) March 25, 2026 10:33
@macfarla macfarla merged commit a4e6e9a into besu-eth:main Mar 25, 2026
56 of 61 checks passed
ahamlat pushed a commit to daniellehrner/besu that referenced this pull request Mar 25, 2026
…h#10083)

* Dispatch snap server request processing off the Netty event loop

SnapProtocolManager.processMessage() was calling snapMessages.dispatch()
synchronously on the Netty event loop thread for incoming GET_* snap
requests. These handlers do heavy work (trie traversal, proof generation,
DB reads) that can take several seconds, blocking the event loop and
preventing the server from responding to concurrent ETH protocol messages
such as GET_BLOCK_HEADERS. This caused the client to accumulate 5+
(eth,3) timeouts and disconnect the server with TIMEOUT.

Fix: for even-coded snap request messages, dispatch the
heavy snapMessages.dispatch() work and the subsequent send() to
EthScheduler's service thread pool. The Netty event loop now returns
immediately after dispatching, staying free to handle ETH messages.
Response messages (odd codes) are already handled by dispatchMessage()
and need no changes.

- Add REQUEST_CODES to SnapV2 so GET_BLOCK_ACCESS_LISTS (0x08) requests
  from snap/2 peers are scheduled off the Netty event loop

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>

* Catch FramingException inside scheduled snap request task

---------

Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Stefan Pingel <16143240+pinges@users.noreply.github.com>
Signed-off-by: Ameziane H. <ameziane.hamlat@consensys.net>
@macfarla macfarla deleted the dispatch-off-netty-only branch March 26, 2026 04:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants