Conversation
|
No issues found. The refactoring correctly removes redundant per-parachain tracking in ImplicitView and simplifies the prospective-parachains subsystem as described. |
|
Claude review: ✅ Behavioral Equivalence: CONFIRMED The review confirms that end-to-end behavior is entirely unchanged, with only internal implementation simplified. Key Validations:
Minor Notes:
Confidence: 95% (High)The agent recommends:
|
b0f2e65 to
bddd175
Compare
|
All GitHub workflows were cancelled due to failure one of the required jobs. |
For simplicity, reasoning and efficiency.
+ Refactor/cleanup + some docs.
parents Remove per-parachain tracking of allowed relay parents in backing implicit view. Allowed relay parents are now computed per relay parent (globally) based on scheduling lookahead, rather than per parachain. Changes: - Remove `collating_for` parameter and para_id tracking from View - Update all callers to remove para_id arguments - Refactor tests with helper functions to reduce ~150 lines of duplication - Add comprehensive documentation to tests explaining expected behavior - Clarify `paths_via_relay_parent` returns full paths from oldest block to leaf This simplification aligns with the reality that all parachains share the same allowed relay parent windows at a given relay chain block.
364e7fc to
d80f3f6
Compare
Resolved merge conflicts after rebasing on PR #10650: - Fixed malformed conflict markers in lib.rs - Updated CandidateEntry::new() calls with v3_enabled parameter - Merged imports in tests.rs (BlockNumber + NodeFeatures) - Fixed Scope method calls to use relay_chain_scope - Removed unused BlockNumber import - Added #[allow(dead_code)] to scheduling_parent() method All tests passing (35/35). Ready for cumulus-side integration.
|
One big difference when relying on prospective-parachains for getting the ancestry in the implicit view (used by the other backing subsystems) is that it took into account core scheduling. So the minimum relay parents of a paraid that is not scheduled would return nothing. Just pointing this out, I've yet to finish reviewing this PR |
| active_leaves, | ||
| peer_state.para_id, | ||
| ) | ||
| is_relay_parent_in_implicit_view(hash, implicit_view, active_leaves) |
There was a problem hiding this comment.
I believe that this code now no longer prunes advertisements for paraids that are no longer scheduled on any of the active leaves.
This used to be different, since prospective-parachains was previously taking into account the core assignments as well
There was a problem hiding this comment.
I am pretty sure the behavior is completely unchanged. An unscheduled para would not have been added to begin with.
There was a problem hiding this comment.
An unscheduled para would not have been added to begin with.
it used to be scheduled on some relay parents in the implicit view. but now it no longer is under any active leaf.
If that was the case, the previous code would have taken into account if it's still scheduled under any active leaf (if I'm reading the code correctly)
There was a problem hiding this comment.
btw, I managed to get claude to prove this with tests
There was a problem hiding this comment.
any updates on this? Also if proven with unit tests, make sure it is not assuming scenarios that are ensured elsewhere to never happen.
There was a problem hiding this comment.
I don't understand how either would cause this. What am I missing?
There was a problem hiding this comment.
I don't understand how either would cause this. What am I missing?
core-sharing paras:
you have at RP X the following assignments for the assigned core: A B B
a new leaf comes in, where now you have the following assignments: B B B
Your implicit view will tell you that it's fine to accept advertisements for paraid A.
Previously, you would not accept them.
Same for a group rotation. at a new leaf, you may get completely different assignments if your backing group rotated to a different core
There was a problem hiding this comment.
you have at RP X the following assignments for the assigned core: A B B
a new leaf comes in, where now you have the following assignments: B B B
This should be handled perfectly fine as the relay parent for the initial A moved out of scope -no?
For group rotation: The claim queue does not change at all, just the core we are responsible for.
There was a problem hiding this comment.
you have at RP X the following assignments for the assigned core: A B B
a new leaf comes in, where now you have the following assignments: B B BThis should be handled perfectly fine as the relay parent for the initial A moved out of scope -no?
The relay parent where A is still part of the assignments, as it's still within the allowed ancestry.
For group rotation: The claim queue does not change at all, just the core we are responsible for.
Indeed, but the collator protocol does not care about this. It just looks at the assignments at the core assigned to the validator at that relay parent
There was a problem hiding this comment.
Fixed the collator protocol. Please have a look.
|
lgtm! |
| // Create the relay chain scope once for this relay parent. | ||
| // All paras share the same relay chain ancestry. | ||
| // The ancestry is already limited by session boundaries and scheduling lookahead. | ||
| let relay_chain_scope = match fragment_chain::RelayChainScope::with_ancestors( |
There was a problem hiding this comment.
ultra nit: Can we import this at the top of the file?
| //! | ||
| //! ### Layer 2: AssetHub Hold-off (in `handle_advertisement`) | ||
| //! - **Purpose:** Rate-limit permissionless AssetHub collators to prefer invulnerable collators | ||
| //! - **Behavior:** Non-invulnerable collators' advertisements are delayed by HOLD_OFF_DURATION |
There was a problem hiding this comment.
Can we call them vulnerable?
Found a bug when reviewing a PR, that wasn't hit, because of code rot: The implicit view is barely used/only abused by prospective parachains - most of the functionality is not used, essentially it was only used as a HashMap, a redundant one even, as all the contained data was kept separately already. ## Summary Removes redundant tracking between prospective-parachains and backing implicit view, and simplifies both subsystems to prepare for the scheduling_parent design where execution context (relay_parent) and scheduling context (scheduling_parent) will diverge. ## What Changed ### 1. Remove ImplicitView from prospective-parachains (7f085d9) **Problem**: Prospective-parachains maintained its own relay chain ancestry in fragment chain scopes, but also fed the same data to ImplicitView and queried it back—pure redundancy. **Fixed**: - Removed ImplicitView field from prospective-parachains - Implement relay parent retention directly using fragment chain scopes - Removed buggy `activate_leaf_from_prospective_parachains()` method from ImplicitView (had a bug where leaf hash wasn't included in allowed_relay_parents_contiguous, violating documented invariants) ImplicitView tracks *scheduling* context (when to back), prospective-parachains tracks *execution* context (what can build). With scheduling_parent, these become completely different. ### 2. Separate relay chain scope from para scope (dea725b) **Changed**: Split prospective-parachains `Scope` into two: - `RelayChainScope`: relay parent + ancestors (shared across all paras) - `Scope`: pending availability + base constraints (para-specific) Relay chain ancestry is shared data; separating it improves clarity and efficiency. ### 3. Remove GetMinimumRelayParents message (b8369d4) **Removed**: `ProspectiveParachainsMessage::GetMinimumRelayParents` This message was used by ImplicitView to query prospective-parachains for minimum relay parents. With ImplicitView now computing allowed relay parents directly from `scheduling_lookahead`, this query is no longer needed. This was another example of code rot: We queried and kept state of this per parachain, while it is the same for all parachains at a given relay parent. Code is simpler now, with less dependencies. ### 4. Simplify ImplicitView to per-relay-parent model (5d3b368) **Simplified**: Removed per-para tracking from ImplicitView - No more `collating_for` parameter, now single code path - No more `para_id` in `known_allowed_relay_parents_under()` - No more `minimum_relay_parents` HashMap Allowed relay parents for *scheduling* are determined by `scheduling_lookahead` (a global relay chain parameter). All paras share the same allowed relay parents at any relay block. **Test improvements**: Added helpers reducing ~150 lines duplication, comprehensive docs explaining what each test verifies, positive assertions for retained blocks. (cherry picked from commit 67b4794)
|
Successfully created backport PR for |
| /// **Usage:** | ||
| /// - `is_slot_available()`: validates advertisements using offset-based position checks | ||
| /// - `unfulfilled_claim_queue_entries()`: determines fetch priority based on CQ order | ||
| leaf_claim_queues: HashMap<Hash, BTreeMap<CoreIndex, VecDeque<ParaId>>>, |
There was a problem hiding this comment.
IMO this should have been added in the implicit view, preserving the existing functionality.
This way, we would ensure that any other subsystems that were relying on this are not broken.
I think they may be now: See here as an example of the same issue
There was a problem hiding this comment.
the code in here got even more complicated now, and we'd need more duplication.
We cleaned up prospective parachains a bit but complicated everything else that was using the implicit view 😅
…de feature (#10472) ## Overview This PR introduces **V3 candidate descriptors** with an explicit `scheduling_parent` field, separating the scheduling context (which determines validator group assignment) from the execution context (which determines parachain state). This is a critical foundation for enabling lookahead scheduling and improving parachain block production flexibility in async backing. **Key Innovation:** V3 candidates can be scheduled based on a different relay chain block than the one they execute against, enabling validators to assign backing responsibilities ahead of time while maintaining correct execution semantics. ## Problem In V1 and V2 candidate descriptors, the `relay_parent` field serves a dual purpose: 1. **Execution context**: Determines which relay chain state the parachain block executes against 2. **Scheduling context**: Determines which validator group is assigned to back the candidate This tight coupling limits flexibility: - Candidates must be scheduled on the exact block they execute against - No lookahead scheduling possible - Reduces predictability for parachains about when they can produce blocks ## Solution V3 candidates introduce an explicit `scheduling_parent` field that decouples these concerns: - **`relay_parent`**: Determines execution context (parachain state root, claim queue state for core assignment) - **`scheduling_parent`**: Determines validator group assignment (which validators back this candidate) **For backward compatibility:** - V1/V2 candidates: `scheduling_parent == relay_parent` (implicit, behavior unchanged) - V3 candidates: `scheduling_parent` can differ from `relay_parent` (explicit field in descriptor) This separation enables lookahead scheduling where parachains can be assigned to validator groups on future relay chain blocks while still executing against older state. ## Key Changes ### Primitives (`polkadot/primitives/src/v9/mod.rs`) - **`CandidateDescriptorVersion::V3`**: New enum variant for version detection - **`CandidateDescriptorV2::new_v3()`**: Constructor accepting explicit `scheduling_parent` parameter - **`scheduling_parent(v3_enabled: bool) -> Hash`**: Accessor returning scheduling_parent for V3, relay_parent for V1/V2 - **`scheduling_session(v3_enabled: bool) -> Option<SessionIndex>`**: Accessor for scheduling session (offset-based for V3) - **Version detection**: Feature-gated logic (`CandidateReceiptV3` node feature) to distinguish V1 from V3 using reserved fields ### PVF Extension (`polkadot/parachain/src/primitives.rs`) - **`ValidationParamsExtension::V3`**: New extension type containing both `relay_parent` and `scheduling_parent` hashes - **`TrailingOption<T>`**: Backward-compatible wrapper that decodes `T` from trailing bytes if present, or `None` if at EOF - V3 candidates: PVF receives extension with both hashes - V1/V2 candidates: PVF receives no extension bytes (TrailingOption decodes as None) - Old PVFs: Gracefully ignore trailing extension bytes (don't fail to decode) ### Subsystem Messages (`polkadot/node/subsystem-types/src/messages.rs`) - **`BackableCandidateRef`**: New struct containing `candidate_hash` and `scheduling_parent` (replaces bare `CandidateHash`) - **`CandidateBackingMessage::Second`**: Now includes explicit `scheduling_parent: Hash` parameter - **`CandidateBackingMessage::Statement`**: Now includes explicit `scheduling_parent: Hash` parameter - **`CandidateBackingMessage::GetBackableCandidates`**: Uses `Vec<BackableCandidateRef>` instead of `Vec<CandidateHash>` - **`CanSecondRequest`**: Includes `candidate_scheduling_parent` field for validator group lookup ### Core Subsystems #### **Candidate Backing** (`polkadot/node/core/backing/`) - Use `scheduling_parent` (not `relay_parent`) for validator group lookups - Track per-scheduling-parent state instead of per-relay-parent - Validate candidates against scheduling_parent context (session, group assignment) #### **Candidate Validation** (`polkadot/node/core/candidate-validation/`) - For V3: Append `ValidationParamsExtension::V3` bytes to PVF validation input - For V1/V2: No extension bytes appended (backward compatible) - PVF workers decode extension using `TrailingOption` pattern #### **Prospective Parachains** (`polkadot/node/core/prospective-parachains/`) - Track candidates with their `scheduling_parent` hash - Validate `scheduling_parent` is in active leaves before accepting candidates - Support fragment chains with mixed V1/V2/V3 candidates ### Network Protocol #### **Collator Protocol** (`polkadot/node/network/collator-protocol/`) - Track collations **per-scheduling-parent** instead of per-relay-parent - `PendingCollation` and `FetchedCollation` now include `scheduling_parent` field - Validate advertised `scheduling_parent` matches fetched descriptor's actual `scheduling_parent(v3_enabled)` - Ensure `scheduling_parent` is an active leaf before accepting collations #### **Statement Distribution** (`polkadot/node/network/statement-distribution/src/v2/`) - Rename `PerRelayParentState` → `PerSchedulingParentState` - Rename `per_relay_parent` map → `per_scheduling_parent` - Use `scheduling_parent` as key for state lookups (validator groups, candidate tracking) - Add clarifying comments in tests explaining `relay_parent` serves dual role for V1/V2 ### Test Infrastructure - **Remove 114 lines** of obsolete `CandidateReceiptV2` node feature checks (V2 now enabled everywhere) - Add V3-specific tests: - `v3_descriptors_are_accepted_when_enabled`: V3 with UMP signals accepted - `v3_descriptors_without_ump_signals_are_rejected`: V3 without UMP signals rejected - `v3_descriptors_rejected_as_v1_when_disabled`: V3 rejected as V1 when feature disabled - Update all test call sites to use new descriptor version enum ## Backward Compatibility Multiple layers of protection ensure safe gradual rollout: 1. **Node Feature Gating**: V3 only recognized when `CandidateReceiptV3` node feature enabled (requires 2/3+ validator upgrade) 2. **Mandatory UMP Signals**: V3 candidates MUST include UMP signals (`SelectCore` at minimum) - Prevents old nodes from mistakenly backing V3 candidates (they'd see them as invalid V1) - No slashing risk for validators during transition period 3. **TrailingOption Pattern**: PVF extension gracefully handled - Old PVFs: Don't decode extension, behave as before - New PVFs: Decode extension if present, use both hashes 4. **Version Detection**: Backwards compatible logic distinguishes V1 from V3 - V3 uses `version == 1` (vs V2's `version == 0`) - Reserved fields checked to prevent misidentification - Old nodes see V3 as invalid V1 (missing UMP signals) 5. **Runtime Protection**: Runtime drops candidates violating version-specific rules - V3 without UMP signals: Rejected - V3 with invalid scheduling_parent: Rejected ## Review Focus ### High Priority - Correctness 1. **Version detection logic** (`polkadot/primitives/src/v9/mod.rs:CandidateDescriptorV2::version()`) - Ensures V1 and V3 are correctly distinguished - Verify old nodes cannot misinterpret V3 as V1 2. **TrailingOption safety** (`polkadot/parachain/src/primitives.rs`, `polkadot/node/core/pvf/execute-worker/`) - Confirm it only works as final field (documented with safety warnings) - Verify old PVFs don't fail when extension bytes absent 3. **Scheduling_parent validation** (`polkadot/node/network/collator-protocol/`, `polkadot/node/core/backing/`) - Must be active leaf before accepting candidates - Used correctly for validator group lookups 4. **UMP signal enforcement** (`polkadot/runtime/parachains/src/paras_inherent/mod.rs`) - Runtime rejects V3 without mandatory UMP signals - Prevents security issues with old nodes ### Medium Priority - Architecture 5. **Message flow** (`polkadot/node/subsystem-types/src/messages.rs`) - `scheduling_parent` correctly threaded through subsystem messages - `BackableCandidateRef` used consistently 6. **State tracking** (`polkadot/node/core/backing/`, `polkadot/node/network/statement-distribution/`) - Per-scheduling-parent state management (not per-relay-parent) - Correct hashmap key usage 7. **PVF extension encoding/decoding** (`polkadot/node/core/candidate-validation/`) - Extension appended correctly for V3 - No extension for V1/V2 (backward compatible) ### Lower Priority - Cleanup 8. **Obsolete V2 feature checks removed** (114 lines in `paras_inherent/tests.rs`) 9. **Naming consistency** (`per_relay_parent` → `per_scheduling_parent`) 10. **Test infrastructure refactoring** (descriptor version enum in `builder.rs`) ## CI Coverage CI verifies: - All affected packages compile successfully - All existing tests pass (runtime, subsystems, network protocols) - New V3-specific tests validate: - V3 descriptors accepted when feature enabled - V3 descriptors rejected without mandatory UMP signals - V3 descriptors rejected as V1 when feature disabled - Scheduling_parent validation in collator-protocol and backing ## Critical Invariants 1. **No slashing risk**: Old validators cannot mistakenly back V3 candidates (UMP signal requirement prevents this) 2. **Correct validator grouping**: Always determined by `scheduling_parent`, never `relay_parent` (for V3) 3. **Active leaf requirement**: `scheduling_parent` must be in validator's active leaves 4. **Core assignment correctness**: UMP `SelectCore` signal matches claim queue assignment 5. **PVF safety**: V1/V2 PVFs don't fail when `TrailingOption` decodes as None ## Related - **Base PR**: #10650 (`rk-prospective-parachains-cleanup`) - used for separate review --- **Scope**: ~4,500 lines added, ~2,350 lines removed across 58 files --------- Signed-off-by: Iulian Barbu <iulian.barbu@parity.io> Co-authored-by: eskimor <eskimor@noreply.com> Co-authored-by: Iulian Barbu <iulian.barbu@parity.io> Co-authored-by: Iulian Barbu <14218860+iulianbarbu@users.noreply.github.com>
Found a bug when reviewing a PR, that wasn't hit, because of code rot: The implicit view is barely used/only abused by prospective parachains - most of the functionality is not used, essentially it was only used as a HashMap, a redundant one even, as all the contained data was kept separately already.
Summary
Removes redundant tracking between prospective-parachains and backing implicit view, and simplifies both subsystems to prepare for the scheduling_parent design where execution context (relay_parent) and scheduling context (scheduling_parent) will diverge.
What Changed
1. Remove ImplicitView from prospective-parachains (7f085d9)
Problem: Prospective-parachains maintained its own relay chain ancestry in fragment chain scopes, but also fed the same data to ImplicitView and queried it back—pure redundancy.
Fixed:
activate_leaf_from_prospective_parachains()method from ImplicitView (had a bug where leaf hash wasn't included in allowed_relay_parents_contiguous, violating documented invariants)ImplicitView tracks scheduling context (when to back), prospective-parachains tracks execution context (what can build). With scheduling_parent, these become completely different.
2. Separate relay chain scope from para scope (dea725b)
Changed: Split prospective-parachains
Scopeinto two:RelayChainScope: relay parent + ancestors (shared across all paras)Scope: pending availability + base constraints (para-specific)Relay chain ancestry is shared data; separating it improves clarity and efficiency.
3. Remove GetMinimumRelayParents message (b8369d4)
Removed:
ProspectiveParachainsMessage::GetMinimumRelayParentsThis message was used by ImplicitView to query prospective-parachains for minimum relay parents. With ImplicitView now computing allowed relay parents directly from
scheduling_lookahead, this query is no longer needed. This was another example of code rot: We queried and kept state of this per parachain, while it is the same for all parachains at a given relay parent. Code is simpler now, with less dependencies.4. Simplify ImplicitView to per-relay-parent model (5d3b368)
Simplified: Removed per-para tracking from ImplicitView
collating_forparameter, now single code pathpara_idinknown_allowed_relay_parents_under()minimum_relay_parentsHashMapAllowed relay parents for scheduling are determined by
scheduling_lookahead(a global relay chain parameter). All paras share the same allowed relay parents at any relay block.Test improvements: Added helpers reducing ~150 lines duplication, comprehensive docs explaining what each test verifies, positive assertions for retained blocks.