Skip to content

Static peers get stuck in ALREADY_CONNECTED loop after restart race condition #10040

@macfarla

Description

@macfarla

Description:

When two Besu nodes have each other configured as static peers and restart in close timing, one node can end up with a stale peer registration while the other node has zero peers. The node with the stale registration rejects all subsequent reconnection
attempts from the other node with ALREADY_CONNECTED, leaving both nodes permanently disconnected until manual intervention (admin_removePeer + admin_addPeer).

Steps to reproduce:

  1. Configure two Besu nodes (A and B) with each other as static peers
  2. Restart both nodes in close succession (within a few seconds)
  3. Observe: one node (e.g. A) shows the other node (B) in admin_peers; B shows zero peers
  4. B continuously attempts to reconnect every 60 seconds and is rejected each time

Expected behavior:

When a peer that is already registered attempts a new connection, Besu should recognise that the peer believes the old connection is dead, disconnect the stale entry, and accept the fresh connection.

Actual behavior:

Besu unconditionally rejects the new connection with DISCONNECT(ALREADY_CONNECTED) without checking whether the existing registration is actually live. The peer is stuck until manual intervention.

Observed log on the node being rejected (B):
AbstractPeerConnection: New PeerConnection established with peer 0xbfae3d...
EthPeers: Disconnected EthPeer 0xbfae3d...
EthProtocolManager: Disconnect - active Connection? false - Inbound - 0x05 ALREADY_CONNECTED Already connected - 0xbfae3d...
(Repeats every 60 seconds indefinitely)

Root cause:

EthPeers.gatePeerConnection() calls alreadyConnectedOrConnecting(), which checks activeConnections.get(id) != null && !ethPeer.isDisconnected(). The isDisconnected() flag is only set when a clean TCP disconnect is processed. In a restart race, the TCP connection may still appear live on one side (RST not forwarded, or timing), leaving a stale EthPeer in activeConnections with isDisconnected() == false. All subsequent reconnection attempts are rejected.

Fix:

When gatePeerConnection finds an active (non-disconnected) peer in activeConnections for the same ID, disconnect the stale entry and allow the new connection to proceed, rather than rejecting it. In-progress incomplete connections (mid-handshake) continue to be rejected as before.

Workaround:

On the node that incorrectly shows the peer as connected, call:
admin_removePeer()
admin_addPeer()

Environment:

  • Besu v26.3-develop
  • Static peer configuration
  • Reproduced on Hoodi testnet (two-node setup)

Metadata

Metadata

Assignees

Labels

P3Medium (ex: JSON-RPC request not working with a specific client library due to loose spec assumtion)bugSomething isn't workingpeering

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions