Skip to content

talm reset default --wipe-mode=all destroys META; selective wipe is the friendlier default #185

@lexfrei

Description

@lexfrei

Problem

talm reset without explicit --wipe-mode / --system-labels-to-wipe runs with the upstream default --wipe-mode=all, which wipes the META partition along with STATE and EPHEMERAL. With META gone the node loses its bootstrap config + machine-config persistence, and cannot self-recover on the next boot — it comes up in maintenance mode and needs a full re-apply.

Selective wipe via --system-labels-to-wipe=STATE,EPHEMERAL leaves META intact; the node re-joins automatically on reboot.

Neither path is wrong — but the default is the destructive one, and the README/help text doesn't make the distinction clear.

Reproduction

# Destructive default: META wiped, node requires fresh apply.
talm reset --graceful=true --reboot --nodes $NODE --endpoints $OTHER_NODE

# Selective: META preserved, node self-recovers.
talm reset --graceful=true --reboot \
  --system-labels-to-wipe=STATE \
  --system-labels-to-wipe=EPHEMERAL \
  --nodes $NODE --endpoints $OTHER_NODE

Verified on dev17 node2: selective wipe → node rebooted, came back as a fresh etcd member from META within ~90s, with the placeholder hostname talos-<id> until the next apply refreshed it.

Expected

Either:

  1. Change the help text for --wipe-mode and --system-labels-to-wipe to make the selective-vs-all trade-off explicit, plus a one-paragraph README note in the reset section.
  2. Change the default to selective (STATE,EPHEMERAL) so the friendly path is the default. Operators who actually want full wipe pass --wipe-mode=all. Riskier change — upstream defaults differ — but the safer default matches operator expectations.

Option 1 is the minimum; Option 2 is the safer one if cozystack is willing to diverge from upstream defaults here.

Why this matters

The pattern "I reset the node, why doesn't it come back?" is one of the most common talm support questions. Selective wipe is the operator-friendly default that lets the node self-heal. The current default surprises operators with a full wipe.

Surfaced during the dev17 manual test plan exercise (section H).

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/commandsIssues or PRs related to pkg/commands (CLI subcommands, flag parsing, root detection)area/docsDocumentation / README / inline help / hint copykind/documentationCategorizes issue or PR as related to documentation, README, hint copy, error-message UXpriority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next releasetriage/acceptedIndicates an issue is ready to be actively worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions