This document captures load-bearing invariants the codebase relies
on. It is NOT a tour of the code — for that, follow the call graph
from cmd/controller/main.go and cmd/satellite/main.go
(satellite). The pieces below are the rules whose violation would
quietly corrupt cluster state.
A blockstor CRD has two halves:
- Spec is desired state. Operators, REST handlers, and the controller's own placement logic write here. The satellite never writes Spec.
- Status is observed state. The satellite (and the controller's allocators that derive values from Spec) write here. The user-facing REST API never writes Status — it returns it read-only.
The rule:
Anything the satellite reads from the kernel (
drbdsetup events2,drbdadm status,lvs, …) lives on Status. Anything the operator or controller asks the satellite to do lives on Spec.
A naive whole-object Update is unsafe whenever both halves are
written by different actors — a Spec mutation in flight would
clobber a concurrent Status write and vice versa. Status writes
must go through the Status subresource (Status().Update());
Spec writes use the regular Update() path.
This rule is enforced by code review for now; once Phase 10.2 lands the satellite-side reconciler entirely (its writes will naturally route through Status only), it becomes mechanical.
| Field | Half | Rationale |
|---|---|---|
Resource.Spec.NodeName |
Spec | Operator-chosen placement target. |
Resource.Spec.Flags |
Spec | DISKLESS / TIE_BREAKER are operator-controlled. |
Resource.Spec.StoragePool |
Spec | Allocator output written by controller. |
Resource.Spec.Volumes[i].SeedFromGi |
Spec | Controller-picked DRBD GI to stamp on first activation. |
Resource.Status.InUse |
Status | Reflects drbdsetup status role. |
Resource.Status.DrbdState |
Status | Reflects events2 connection state. |
Resource.Status.DRBDPort / DRBDMinor / DRBDNodeID |
Status | Allocator-derived; immutable per replica once set. |
Resource.Status.Volumes[i].DiskState |
Status | Per-volume kernel state. |
Resource.Status.Volumes[i].CurrentGi |
Status | DRBD generation identifier observed by the satellite. |
Node.Status.ConnectionStatus |
Status | Set on Hello — reflects whether the satellite has dialled in. |
A field that fits neither half (rare — typically a transient debounce hint) lives in an annotation, not Spec or Status.
Some Status fields are written by both the controller (e.g.
allocator output → DRBDPort) and the satellite (e.g.
DiskState, CurrentGi). A regular .Status().Update() from
either side rewrites the whole Status subresource, which can
clobber the other side's writes that landed between Get and
Update.
Phase 10.2 routes those writes through Kubernetes Server-Side
Apply with distinct field managers (blockstor-controller,
blockstor-satellite) so each side only touches the fields it
owns. See pkg/store/k8s/resources.go SetState for the
satellite-side writer; the controller-side writer lives in
internal/controller.
DRBD configuration follows an upstream-LINSTOR-shaped override chain:
Controller → ResourceGroup → ResourceDefinition → Resource
(broadest) (narrowest)
Lower scopes override higher scopes per non-nil field. The typed
implementation lives in pkg/drbd/typed_resolver.go
(ResolveDRBDOptions); the legacy string-keyed implementation
(ResolveOptions) is still used as a fallback for any
Spec.Props data not yet migrated to the typed DRBDOptions
struct. See internal/controller/resource_controller.go's
resolveEffectiveProps for the merge.
*int32 and *bool use nil-vs-set discipline:
nilmeans "not overridden at this scope, inherit from parent".- Any non-nil value (including the zero value) means "explicitly set, do not inherit".
A regression that did if *src.X { out.X = src.X } would silently
drop explicit-false overrides, e.g. an RD that intentionally
sets AllowTwoPrimaries=false would inherit a parent RG's true.
The pinning tests for this are in pkg/drbd/typed_resolver_test.go.
Two shapes coexist in the codebase:
- Wire shape —
pkg/api/v1types, identical to upstream LINSTOR's REST API. golinstor and external callers see this verbatim. Property bags live asprops map[string]string. - CRD shape —
api/v1alpha1types, the typed structures blockstor persists in Kubernetes. DRBD configuration lives inSpec.DRBDOptions(typed) +Spec.ExtraProps(forward-compat for keys we haven't typed yet).
The k8s store (pkg/store/k8s/) is the boundary. Its
drbd_transcode.go parses the wire props bag into typed CRD
fields on Create/Update; the inverse direction re-emits typed
fields back into props on GET so golinstor sees the unchanged
shape. Unknown DrbdOptions/* keys round-trip through ExtraProps
without loss.
Adding a third replica to a 2-replica resource without intervention would trigger a full resync of the entire backing device — hours on multi-TiB volumes. The skip pipeline (Phase 8.1):
-
Satellite's
events2observer parsescurrent-uuidfrom each device frame and surfaces it in Status asResource.Status.Volumes[i].CurrentGi. -
Controller's
ensureSeedFromGipicks the lowest-named UpToDate peer's CurrentGi when allocating a new replica and stamps it on the new replica'sSpec.Volumes[i].SeedFromGi. -
Dispatcher threads SeedFromGi through the satellite gRPC contract (
DesiredVolume.seed_from_gi). -
Satellite reconciler's
applyDRBDrunsdrbdmeta --force <res>/<vol> v09 <device> internal set-gi <gi>:<gi>:0:0between
drbdadm create-mdanddrbdadm adjuston first activation. With matchingcurrent_uuid+bitmap_uuidthe GI handshake on first connect sees the new peer as already-in-sync and skips the full sync.
The pipeline is end-to-end gated by
tests/e2e/replica-add-no-resync.sh.
internal/controller.ResourceReconciler— Resource CRDs. Allocates DRBD node-id / port / minor; picks SeedFromGi; promotes DISKLESS to diskful when actively used; dispatches the desired-state to the satellite via gRPC.internal/controller.ResourceDefinitionReconciler— RD CRDs. Auto-createsDISKLESS+TIE_BREAKERwitnesses when an RD has even diskful replicas; sets the resource-level quorum policy.internal/controller.ResourceGroupReconciler/NodeReconciler/StoragePoolReconciler/SnapshotReconciler— currently scaffolded but largely no-op past CRD persistence; reconcile logic lives in the dispatcher- satellite.
Phase 10.1 lifts the satellite's gRPC-driven reconciler logic
into pkg/satellite/controllers/ controller-runtime reconcilers
that watch the apiserver directly; that change retires the
pkg/dispatcher/ + pkg/satellitecontroller/ layers entirely.