-
Notifications
You must be signed in to change notification settings - Fork 1k
Static peers get stuck in ALREADY_CONNECTED loop after restart race condition #10040
Description
Description:
When two Besu nodes have each other configured as static peers and restart in close timing, one node can end up with a stale peer registration while the other node has zero peers. The node with the stale registration rejects all subsequent reconnection
attempts from the other node with ALREADY_CONNECTED, leaving both nodes permanently disconnected until manual intervention (admin_removePeer + admin_addPeer).
Steps to reproduce:
- Configure two Besu nodes (A and B) with each other as static peers
- Restart both nodes in close succession (within a few seconds)
- Observe: one node (e.g. A) shows the other node (B) in admin_peers; B shows zero peers
- B continuously attempts to reconnect every 60 seconds and is rejected each time
Expected behavior:
When a peer that is already registered attempts a new connection, Besu should recognise that the peer believes the old connection is dead, disconnect the stale entry, and accept the fresh connection.
Actual behavior:
Besu unconditionally rejects the new connection with DISCONNECT(ALREADY_CONNECTED) without checking whether the existing registration is actually live. The peer is stuck until manual intervention.
Observed log on the node being rejected (B):
AbstractPeerConnection: New PeerConnection established with peer 0xbfae3d...
EthPeers: Disconnected EthPeer 0xbfae3d...
EthProtocolManager: Disconnect - active Connection? false - Inbound - 0x05 ALREADY_CONNECTED Already connected - 0xbfae3d...
(Repeats every 60 seconds indefinitely)
Root cause:
EthPeers.gatePeerConnection() calls alreadyConnectedOrConnecting(), which checks activeConnections.get(id) != null && !ethPeer.isDisconnected(). The isDisconnected() flag is only set when a clean TCP disconnect is processed. In a restart race, the TCP connection may still appear live on one side (RST not forwarded, or timing), leaving a stale EthPeer in activeConnections with isDisconnected() == false. All subsequent reconnection attempts are rejected.
Fix:
When gatePeerConnection finds an active (non-disconnected) peer in activeConnections for the same ID, disconnect the stale entry and allow the new connection to proceed, rather than rejecting it. In-progress incomplete connections (mid-handshake) continue to be rejected as before.
Workaround:
On the node that incorrectly shows the peer as connected, call:
admin_removePeer()
admin_addPeer()
Environment:
- Besu v26.3-develop
- Static peer configuration
- Reproduced on Hoodi testnet (two-node setup)