Statement-store: defer statement protocol connections during major sync#11487
Statement-store: defer statement protocol connections during major sync#11487DenzelPenzel wants to merge 38 commits intomasterfrom
Conversation
|
/cmd prdoc --audience node_dev --bump patch |
…e_dev --bump patch'
s0me0ne-unkn0wn
left a comment
There was a problem hiding this comment.
Looks good (modulo my poor knowledge of Substrate networking internals)
…ent-store-loss-during-major-sync # Conflicts: # cumulus/zombienet/zombienet-sdk/tests/zombie_ci/statement_store.rs # cumulus/zombienet/zombienet-sdk/tests/zombie_ci/statement_store/bench.rs
555918e to
2934407
Compare
|
/cmd fmt |
AndreiEres
left a comment
There was a problem hiding this comment.
Overall ok, but feel uneasy that we loose the peerset on error in reconnect_statement_peers. But zombienet test doesn't catch if the fix works
0c0b061 to
66157b3
Compare
… github.com:paritytech/polkadot-sdk into denzelpenzel/statement-store-loss-during-major-sync # Conflicts: # cumulus/zombienet/zombienet-sdk/tests/zombie_ci/statement_store/integration.rs
66157b3 to
b076543
Compare
|
/cmd fmt |
| /// Scenario: | ||
| /// 1. Spawn charlie only and let the relay chain advance ~10 blocks | ||
| /// 2. Submit multiple statements to charlie | ||
| /// 3. Add dave as a late joiner — dave will enter major sync because the chain has already |
There was a problem hiding this comment.
Do you have anything to proove 3 happens ?
There was a problem hiding this comment.
Yes, this cond confirm it
There was a problem hiding this comment.
But ... this confirms just that you had a major sync happening not that, you received statements while you were in major sync.
bkchr
left a comment
There was a problem hiding this comment.
The current implementation is trying to achieve the goal in a very hacky way. We should just not add the peers to the reserved set as long as we are in major sync mode.
| SyncEvent::PeerConnected(remote) => { | ||
| let addr = iter::once(multiaddr::Protocol::P2p(remote.into())) | ||
| .collect::<multiaddr::Multiaddr>(); | ||
| let result = self.network.add_peers_to_reserved_set( |
There was a problem hiding this comment.
You should not do this when you are in major sync mode. There is no need to connect to the other side, as long as you are doing the major sync.
When the major sync is then ready, you can start adding them to the reserved set and the substream should be opened.
There was a problem hiding this comment.
Agreed that adding new peers to the reserved set during major sync is the right approach for late joiners we can buffer them and add them cleanly when sync ends, avoiding any substream open during the sync window.
However, there is a gap for already-connected peers (nodes whose substream was open before major sync started) we don't have a way to re-trigger it without forcing a close+reopen. So even with the reserved-set approach we are still need something equivalent to the remove+add for already-connected peers to recover missed statements, wdty? @bkchr
There was a problem hiding this comment.
However, there is a gap for already-connected peers (nodes whose substream was open before major sync started)
Major sync should start immediately, the moment the first peer is connected.
There was a problem hiding this comment.
Looking at this:
There still seems to be a gap where if fall behind 5 blocks, we get into major sync and the we will drop statements here, from nodes that are already connected:
So maybe the mitigation for this situations is to also accept statements when we are in major sync, I think that should be good enough.
Or if we want to be bullet proof we could note we dropped statements because it found us in a major sync and have a timer every(1 minute) where we remove some random peer from reserved set if we dropped statements(forcing a disconnect basically) and then next minute we re-add it to reserved set. I would just disconnect a few peers because we don't want to re-sync with all of them.
…ent-store-loss-during-major-sync # Conflicts: # substrate/client/network/statement/src/lib.rs
f6f1aa8 to
894595d
Compare
894595d to
1803dab
Compare
…s-during-major-sync' into denzelpenzel/statement-store-loss-during-major-sync
|
/cmd fmt |
Description
Implement #11411
Reconnect all statement protocol peers when major sync ends. Removing and re-adding peers to the reserved set closes and reopens the notification substreams on both sides, triggering a fresh bidirectional initial sync. This causes peers to re-deliver any statements that were dropped during the sync window.
protocol reserved set
Changes
reconnect_statement_peers: fires when major sync ends, reconnects all statement peers to trigger fresh initial sync