Skip to content

Commit cab19c1

Browse files
committed
fix(commands): capture upgrade target image BEFORE original RunE
talosctl's upgrade handler can overwrite the --image flag with the node's currently-running install.image during the no-op-upgrade path (when the target image already matches what's installed). If Phase 2C reads the flag after RunE, it sees the running image as the target — silent pass even when the apply was a real upgrade. Capture targetImage BEFORE original RunE runs. Phase 2C now verifies against the operator's intended target, not whatever state talosctl left in the flag. Verified on dev17: cross-vendor upgrade with target image correctly set in the node body (machine.install.image: ghcr.io/siderolabs/installer:v1.13.0) now reaches Phase 2C with target=v1.13.0 (not the running v1.12.6). On a successful boot the gate silently passes (target == running); on a failed boot with auto-rollback the gate would correctly block. Refs: #172, #175 While here, fold the K1-pre test-plan entry that documents the new gate's expected output on the cross-vendor mismatch path. Signed-off-by: Aleksei Sviridkin <f@lex.la>
1 parent d6314bb commit cab19c1

2 files changed

Lines changed: 40 additions & 5 deletions

File tree

docs/manual-test-plan.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -554,6 +554,35 @@ newer than the node's running Talos v1.12.x` plus a hint about
554554
rebooting into a matching maintenance image or lowering the
555555
contract. Drift preview still runs.
556556

557+
### K1-pre. Phase 2C version-verify catches silent rollback
558+
559+
⚠️ Same destructive setup as K2, but the gate now does the work
560+
automatically. Run an intentionally-bad cross-vendor upgrade and
561+
expect a hint-bearing blocker:
562+
563+
```bash
564+
# values.yaml carries the wrong vendor / version pair:
565+
sed -i 's|cozystack/talos:v1.12|siderolabs/installer:v1.13|' values.yaml
566+
/tmp/talm-safety upgrade -f nodes/node0.yaml
567+
```
568+
569+
Expected: talosctl upgrade RPC returns success → 90s reconcile
570+
window → `verifyPostUpgradeVersion` reads `runtime.Version`
571+
detects mismatch → blocker:
572+
573+
```
574+
post-upgrade: requested upgrade to v1.13.0 but running version
575+
is v1.12.6 — Talos auto-rolled back the new install
576+
hint: Talos auto-rolled back the new install. Cross-vendor
577+
upgrades (e.g. cozystack-bundled image -> vanilla siderolabs
578+
installer) drop bundled extensions and fail the boot readiness
579+
check; use the cozystack-built image at the target version, or
580+
pass --skip-post-upgrade-verify to bypass.
581+
```
582+
583+
`talm upgrade` exits non-zero — the operator sees the failure
584+
instead of a false "success".
585+
557586
### K2-pre. Silent-rollback after cross-vendor upgrade (regression pin for #175)
558587

559588
⚠️ **Operator footgun caught by this matrix**: a `talm upgrade` that

pkg/commands/upgrade_handler.go

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,13 @@ func wrapUpgradeCommand(wrappedCmd *cobra.Command, originalRunE func(*cobra.Comm
151151
}
152152
}
153153

154+
// Capture the upgrade target image BEFORE original RunE runs.
155+
// talosctl's own upgrade handler can overwrite the --image
156+
// flag with the node's currently-running install.image (the
157+
// no-op-upgrade path), which would mask the version mismatch
158+
// Phase 2C exists to catch.
159+
targetImage, _ := cmd.Flags().GetString("image")
160+
154161
// Execute original command
155162
var execErr error
156163

@@ -170,18 +177,17 @@ func wrapUpgradeCommand(wrappedCmd *cobra.Command, originalRunE func(*cobra.Comm
170177
// Talos pulls + writes the new install, A/B boot fails its
171178
// readiness check, Talos rolls back to the prior partition,
172179
// and the operator's "successful" upgrade silently no-ops.
173-
// Skipped on dry-run-equivalent flows and when the operator
174-
// opts out via --skip-post-upgrade-verify.
180+
// Skipped when the operator opts out via
181+
// --skip-post-upgrade-verify.
175182
if upgradeCmdFlags.skipPostUpgradeVerify {
176183
return nil
177184
}
178185

179-
image, _ := cmd.Flags().GetString("image")
180-
if image == "" {
186+
if targetImage == "" {
181187
return nil
182188
}
183189

184-
return runPostUpgradeVersionVerify(cmd.Context(), image)
190+
return runPostUpgradeVersionVerify(cmd.Context(), targetImage)
185191
}
186192
}
187193

0 commit comments

Comments
 (0)