fix(charts): pin Layer2VIPConfig to subnet-matching link, not default route#163
Conversation
Reproduction for the v1.12 multi-doc Layer2VIPConfig.link bug: when a
controlplane has a public NIC carrying the IPv4 default route AND a
VLAN sub-interface carrying the private cluster network, the VIP
must be pinned to the VLAN child (where the cluster subnet lives),
not the default-route NIC.
The current renderer hardcodes Layer2VIPConfig.link to the IPv4-
default-route link
(charts/cozystack/templates/_helpers.tpl:363 and the symmetric
charts/generic mirror site). On Hetzner-style topology the VIP
lands on the public NIC; Talos installs it there and the cluster
never sees the leader on the private subnet.
Adds:
- hetznerPublicNICWithPrivateVLANLookup: discovery fixture with
enp0s31f6 (physical, public address, default route) and
enp0s31f6.4000 (VLAN child, private address, no default route).
- TestContract_NetworkMultidoc_HetznerTopology_VIPOnPrivateVLAN:
asserts the rendered output contains the correct VLANConfig
(already green today — pinning the regression boundary), the
Layer2VIPConfig pinned to enp0s31f6.4000 (red today), no
Layer2VIPConfig pointing at enp0s31f6 (red today), and the
public NIC retains its address and default route (green today).
Test fails on the current commit; the next commit's helper
addition + chart wiring turn the failing assertions green.
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Mirrors the existing cidrNetwork helper. Both wrap net/netip so chart templates do not have to handle IPv4 vs IPv6 bit math. cidrContains takes a CIDR and an IP literal and returns whether the IP falls inside the prefix. Used by the multi-doc Layer2VIPConfig discovery path (next commit) to pin the VIP on the link whose subnet hosts the floatingIP, rather than always hardcoding it to the IPv4 default-route link. Also hoist cidrNetwork and the new cidrContains out of the inline FuncMap closures into top-level functions so initFunMap stays under the funlen gate. Pinned by TestCidrContainsTemplateFunc with 16 cases covering ipv4 host inside / outside / on boundary / broadcast, /26 first quarter membership, /32 self-match, ipv6 inside / outside, the exact Hetzner topology case (192.168.100.10 in 192.168.100.4/24 and not in 88.99.210.37/26), plus malformed and empty inputs. Signed-off-by: Aleksei Sviridkin <f@lex.la>
Returns the link name whose discovered addresses contain a CIDR encompassing the supplied IP literal. Used by the multi-doc Layer2VIPConfig path to pin the VIP onto the link carrying the relevant subnet, rather than the IPv4 default-route link (wrong whenever the floatingIP lives on a non-default-route NIC, e.g. a private VLAN child on a Hetzner-style topology). Returns the empty string when no link's addresses contain the IP — the caller is responsible for falling back to default_link_name_by_gateway. CIDR membership is computed by the engine-registered cidrContains helper added in the previous commit, so this helper does not have to do per-family bit math in template land. A small wrapper define talm.cidrContainsTarget converts the bool return of cidrContains into the "true"/"" string form include yields, so the caller can `eq ... "true"` and short- circuit on the first match. Signed-off-by: Aleksei Sviridkin <f@lex.la>
The multi-doc Layer2VIPConfig discovery path used to hardcode
link to the IPv4-default-route link ($defaultLinkName), which
puts the VIP on the wrong NIC whenever the floatingIP lives on
a private subnet hosted by a non-default-route link — typical
Hetzner topology with a public NIC carrying the default route
and a VLAN child carrying the private cluster network.
Switch to a two-stage lookup:
1. talm.discovered.link_name_for_address: pick the link whose
discovered addresses contain a CIDR encompassing the
floatingIP. Native cidrContains (net/netip) handles IPv4
and IPv6 uniformly.
2. Fall back to $defaultLinkName when no discovered subnet
contains the floatingIP — preserves backward compatibility
for upstream-routable VIPs that arrive via the default
route.
3. Skip emission entirely when neither resolves a link, the
same as before.
Mirror change in charts/generic/templates/_helpers.tpl. Add the
fallback contract test
TestContract_NetworkMultidoc_FloatingIPNotInDiscoveredSubnetFallsBackToGateway
that pins step 2 against simpleNicLookup with a VIP outside the
discovered eth0 subnet — must still emit Layer2VIPConfig with
link=eth0 (current behaviour, mustn't regress).
Resolves the Hetzner-topology bug pinned by
TestContract_NetworkMultidoc_HetznerTopology_VIPOnPrivateVLAN
in the previous red-pin commit.
Signed-off-by: Aleksei Sviridkin <f@lex.la>
cidrContains is family-agnostic via net/netip.Prefix.Contains, so
the chart-level link-selection helper handles IPv6 the same as
IPv4. The chart side has no per-family branches today, but
without a contract test, a future swap to per-family bit math
could silently regress only the IPv6 path. Pin it.
Adds:
- hetznerPublicNICWithPrivateIPv6VLANLookup: same physical /
VLAN topology as the IPv4 fixture, but the private subnet
is an RFC 3849 documentation /64 (2001:db8:cafe::/64) and
the VLAN child carries 2001:db8:cafe::4/64 in inet6 family.
IPv4 default route stays on the public NIC — mirrors a real
dual-stack Hetzner deployment.
- TestContract_NetworkMultidoc_HetznerTopology_IPv6VIPOnPrivateVLAN:
asserts VLANConfig emits the IPv6 ULA, Layer2VIPConfig
pins to enp0s31f6.4000, no document points at the public
NIC, and the IPv4 uplink survives.
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Originally cidrContains returned an error on any unparseable input. The chart-side helper that drives the multi-doc Layer2VIPConfig selection iterates over EVERY entry in the addresses COSI resource; a single corrupt or future-format entry would have crashed the entire chart render. Likewise an operator-typoed floatingIP would have hard-failed instead of falling through to the default-route fallback. Switch to set-membership semantics: parse failure on either input returns (false, nil). An undefined CIDR cannot contain anything; an undefined IP is not in any defined set. Talos itself rejects malformed IP literals with a clear error on apply, so the chart layer does not need to duplicate that check. Update TestCidrContainsTemplateFunc to pin the lenient behaviour: malformed CIDR, malformed IP, empty CIDR, empty IP, both empty — all return "false" with no error. Signed-off-by: Aleksei Sviridkin <f@lex.la>
Three correctness gaps in the first cut of
talm.discovered.link_name_for_address:
1. Unfiltered against configurable_link_names. An address on a CNI
bridge (cni0 carrying 10.244.0.0/16), kernel-managed virtual
interface, or Wireguard could win selection if its CIDR happened
to contain the floatingIP. The multi-doc loop only emits
LinkConfig for configurable links, so a VIP pinned to a
non-configurable link would have no surrounding network document
and would race CNI's address management on apply.
2. No scope filter. host/link/nowhere-scoped addresses (loopback,
IPv6 link-local, kernel artifacts) could match. addresses_by_link
already strips these before LinkConfig.addresses; the VIP-link
helper now applies the same filter for consistency.
3. Iteration-order picks the winner when multiple links match. A
/16 on a physical NIC and a /24 on a VLAN child both
encompassing the floatingIP would silently land on whichever
discovery happened to list first. Switch to longest-prefix match
— the kernel's rule for route decisions — so the more specific
subnet always wins.
The cross-iteration "best so far" state uses the dict + sprig set
mutation pattern (Go-template range introduces a new scope per
iteration so plain $var = ... reassignment does not propagate).
Also drops the talm.cidrContainsTarget wrapper: cidrContains is
callable directly inside the template body now that the result is
used as a bool inside `if`, not as a string inside `eq`.
Pinned by three new contract tests:
- TestContract_NetworkMultidoc_VIPSkipsNonConfigurableLink:
cni0 carrying 10.244.0.0/16, floatingIP 10.244.0.5 inside it,
expected link = enp0s31f6 (default-route fallback) NOT cni0.
- TestContract_NetworkMultidoc_VIPLinkLongestPrefixMatch:
enp0s31f6 with /16 listed first, enp0s31f6.4000 with /24
listed second, both contain 192.168.100.10 — /24 wins.
- TestContract_NetworkMultidoc_VIPLinkSurvivesMalformedAddressEntry:
one "definitely-not-a-cidr" entry sandwiched in addresses,
render still succeeds (validates the lenient cidrContains
contract from the previous commit at the chart layer).
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The vipLink override docstring in cozystack/values.yaml,
generic/values.yaml, and the corresponding paragraph in README.md
still described the old "default-gateway link is what discovery
would otherwise pick" behaviour. After the helper rework the
selection is:
1. Longest-prefix match across configurable links — when
floatingIP falls inside a configurable link's address CIDR,
the most specific subnet wins. Handles the Hetzner-style
topology this PR fixes.
2. Fallback to the IPv4-default-gateway-bearing link, for
upstream-routable VIPs that arrive via the default route.
3. Addresses on non-configurable links (CNI bridges, kernel-
managed loopbacks, Wireguard) are skipped — the chart does
not emit LinkConfig for them, so a VIP pinned there would
have no surrounding network document.
Update both values.yaml docstrings (verbatim sync — they shipped
identical and stay identical) plus the README's vipLink paragraph
to spell out the new precedence and the Hetzner-topology
motivation.
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Three correctness gaps and one coverage gap from review:
1. String-splitting "/" for prefix length had a /0 default-route
tie-at-zero failure mode that masked iteration-order behaviour.
Replace with a new engine helper cidrPrefixLen
(net/netip.Prefix.Bits) that returns -1 on parse failure — loses
to any valid prefix length under `gt`, so tie-at-zero becomes
impossible and the longest-prefix comparator no longer relies on
string parsing in template land.
2. cidrContains parse-leniency on operator-typoed floatingIP
silently fell through to the default-link fallback, surfacing
only at apply time as a Talos rejection. Add ipIsValid engine
helper and call it at the top of the multi-doc Layer2VIPConfig
block in both cozystack and generic charts: a malformed
floatingIP fails the render with a clear hint that names the bad
value. Render-time fail is much cheaper to debug than apply-time
rejection.
3. Filter 2 in link_name_for_address (skip host/link/nowhere-scoped
addresses) had no contract test pinning it. Adds
hetznerWithLinkScopedAddressLookup fixture and
TestContract_NetworkMultidoc_VIPSkipsLinkScopedAddress: a VLAN
child carries both a global /24 (cluster subnet) and a
link-local /16 (RFC 3927). With Filter 2 the link-local entry
is skipped; without it, longest-prefix would pick the /16 and
land the VIP on link-local space.
4. Generic chart carried a verbatim copy of the cozystack VIP-
selection block but every existing contract test routed through
renderCozystackWith. Adds four
TestContract_NetworkMultidoc_Generic_* mirrors so a regression
in only the generic chart's hunk surfaces immediately.
Auxiliary test infrastructure:
- renderCozystackExpectError helper for tests that pin
error-message contracts (e.g. fail-fast on malformed
floatingIP).
- templateBoolTrue const since "true" now appears in multiple
helpers (ipIsValid, talm.cidrContainsTarget historically — now
only ipIsValid).
Direct unit tests:
- TestCidrPrefixLenTemplateFunc: 9 cases (ipv4/ipv6 prefixes,
/0, /32, /128, malformed → -1, missing prefix → -1, empty → -1).
- TestIPIsValidTemplateFunc: 8 cases (ipv4/ipv6 valid, octet
> 255, host-CIDR rejected, hostname rejected, empty → false).
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…on order
Four review nits, none individually blocking but cheap to clean up:
1. Doc inaccuracy. Comments and values.yaml previously claimed
"CNI bridges cannot win selection" — but real CNI bridges
(cni0, cilium_host, weave) typically report kind="bridge" to
Talos COSI, and bridge IS in configurable_link_names today.
The actual exclusion mechanism is "links the chart does not
emit a per-link document for" — Wireguard, kernel-managed
loopback, slave NICs of a bond, anything outside the
{physical NIC, bond, vlan, bridge} set. Tighten the wording
in cozystack/values.yaml, generic/values.yaml, README.md, and
the talm helper docstring.
2. Fixture renamed cni0 → wg0. Wireguard is the canonical
real-world non-configurable link (no busPath, kernel-managed,
kind="ether" reported on Linux). Keeps the test honest about
what the filter actually catches; CNI-bridge mention removed
from the fixture comment.
3. Annotation on the configurable_link_names call. The talm
helper passed $target (the floatingIP string) into a function
that ignores its dot. The previous shape was confusing on
first read. Comment now explains the call shape matches sister
callers in cozystack/generic charts.
4. Pin the validation order. Add
TestContract_NetworkMultidoc_VIPFailsOnInvalidFloatingIPEvenWithVipLinkOverride
to confirm the malformed-floatingIP fail-fast runs BEFORE the
vipLink override branch, so an operator who set vipLink still
gets a render-time error rather than a Layer2VIPConfig document
with a nonsense `name:` value. Pinning this catches a future
refactor that moves the validation below the override.
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…o-default-route case
Five blocking nits + two cosmetics from review:
1. Strip private-context leak from contract test: reporter name
and report-date chronology are private workflow context. Replace
with content-only description of the topology.
2. Strip workflow-chronology language from the same docstring.
Comments like "the test will fail on main today" describe the
activity that produced the artifact, not the artifact at rest;
they go stale the moment both commits land. Rewrite to describe
the contract being pinned.
3. Stale inline comment in the Wireguard fixture (renamed cni0 →
wg0 earlier but missed the address-table comment). Fix.
4. Tighten Filter 2 in link_name_for_address to match
addresses_by_link exactly: truthy check on the raw .spec.scope
field AND non-empty toString check AND not in skip-list.
Previously only the toString form was checked, which lets a
nil-scope entry through as the literal string "<nil>". Real
Talos COSI always sets scope, so this was latent rather than
active, but the docstring claim "same filter" is now accurate.
5. Pin the deliberate behavior change: when discovery has a
subnet-matching configurable link but no IPv4 default route,
Layer2VIPConfig must still emit. The outer $defaultLinkName
guard moved inside the body and only gates the fallback path —
a successful link_name_for_address resolution emits regardless.
Add noDefaultRouteWithSubnetMatchLookup fixture and
TestContract_NetworkMultidoc_VIPEmitsWithMatchingSubnetEvenWithoutDefaultRoute
so a future refactor that reintroduces the outer guard
surfaces immediately.
Cosmetics:
- Drop "consistency" claim from configurable_link_names call
annotation in talm helper. Sister callers pass full chart
context, this one passes the floatingIP string — calls are
syntactically similar but semantically different.
- Drop redundant `int (...)` cast on cidrPrefixLen — the helper
already returns int.
- Document tie-breaker behavior: ties at the same prefix length
resolve by COSI's emission order. Rare in practice.
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Previous wording claimed an operator-typoed floatingIP would fall through cidrContains to the default-route fallback link "rather than failing the render". That stopped being true once the chart layer added ipIsValid fail-fast for floatingIP — a typo now aborts the render at template time with a clear hint, never reaching cidrContains. Rewrite the paragraph so a reader looking at cidrContains understands the actual flow: lenient on COSI iteration (a corrupt entry mustn't crash the render) but the operator's floatingIP is validated up-front by ipIsValid, so a typo is caught before cidrContains is invoked. Signed-off-by: Aleksei Sviridkin <f@lex.la>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughImplements package-level CIDR/IP helpers and talm.discovered link-resolution helpers, hoists and validates ChangesVIP Link Selection via Longest-Prefix-Match
🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues:
Possibly related PRs:
Suggested reviewers:
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request enhances the floatingIP link selection logic in Layer2VIPConfig by implementing a longest-prefix-match discovery rule across configurable network links, with a fallback to the default gateway. It introduces new Helm template functions (cidrContains, cidrPrefixLen, ipIsValid) and comprehensive integration tests covering various network topologies. Review feedback suggested improving the consistency of the ipIsValid helper by returning a boolean instead of a string for more idiomatic template usage and optimizing the efficiency of a loop by moving a static list definition outside of it.
| func ipIsValid(addrStr string) (string, error) { | ||
| _, err := netip.ParseAddr(addrStr) | ||
| if err != nil { | ||
| //nolint:nilerr // parse-failure is the "false" outcome of this predicate | ||
| return "", nil | ||
| } | ||
|
|
||
| return templateBoolTrue, nil | ||
| } |
There was a problem hiding this comment.
The ipIsValid helper returns a string ("true" or ""), which is inconsistent with cidrContains (line 340) which returns a bool. For predicates used in template conditions, returning a bool is more idiomatic and allows for cleaner usage like (not (ipIsValid .Values.floatingIP)) instead of (ne (ipIsValid .Values.floatingIP) "true").
Additionally, the comment for templateBoolTrue (line 124) mentions "and similar predicates", but cidrContains does not follow this convention. It would be better to have both return bool and remove the templateBoolTrue constant.
| func ipIsValid(addrStr string) (string, error) { | |
| _, err := netip.ParseAddr(addrStr) | |
| if err != nil { | |
| //nolint:nilerr // parse-failure is the "false" outcome of this predicate | |
| return "", nil | |
| } | |
| return templateBoolTrue, nil | |
| } | |
| func ipIsValid(addrStr string) (bool, error) { | |
| _, err := netip.ParseAddr(addrStr) | |
| if err != nil { | |
| //nolint:nilerr // parse-failure is the "false" outcome of this predicate | |
| return false, nil | |
| } | |
| return true, nil | |
| } |
There was a problem hiding this comment.
Applied in d814077. ipIsValid now returns Go bool; chart wiring simplifies to if not (ipIsValid .Values.floatingIP). templateBoolTrue is gone (no more literal "true" strings in engine.go). Tests adjusted to expect "false" instead of "" because Helm's renderer stringifies a bool result.
| nonsense `name:` value, and surface only at apply time. | ||
| Render-time `fail` with the bad value is much cheaper to | ||
| debug. */}} | ||
| {{- if and .Values.floatingIP (ne (ipIsValid .Values.floatingIP) "true") (eq .MachineType "controlplane") }} |
There was a problem hiding this comment.
Applied in d814077. ipIsValid now returns Go bool; chart wiring simplifies to if not (ipIsValid .Values.floatingIP). templateBoolTrue is gone (no more literal "true" strings in engine.go). Tests adjusted to expect "false" instead of "" because Helm's renderer stringifies a bool result.
| nonsense `name:` value, and surface only at apply time. | ||
| Render-time `fail` with the bad value is much cheaper to | ||
| debug. */}} | ||
| {{- if and .Values.floatingIP (ne (ipIsValid .Values.floatingIP) "true") (eq .MachineType "controlplane") }} |
There was a problem hiding this comment.
Applied in d814077. ipIsValid now returns Go bool; chart wiring simplifies to if not (ipIsValid .Values.floatingIP). templateBoolTrue is gone (no more literal "true" strings in engine.go). Tests adjusted to expect "false" instead of "" because Helm's renderer stringifies a bool result.
| {"ipv4 valid", "192.168.100.10", "true"}, | ||
| {"ipv4 host CIDR not a bare ip", "192.168.100.10/24", ""}, | ||
| {"ipv4 octet > 255", "10.0.0.300", ""}, | ||
| {"ipv4 with junk suffix", "192.168.100.10x", ""}, | ||
| {"ipv6 valid", "2001:db8::1", "true"}, | ||
| {"ipv6 with zone", "fe80::1%eth0", "true"}, | ||
| {"empty returns false", "", ""}, | ||
| {"hostname not an ip", "controlplane.example.com", ""}, | ||
| } |
There was a problem hiding this comment.
If ipIsValid is updated to return a bool, the expected values in these tests should be updated. Note that renderExpr renders the boolean result to a string ("true" or "false").
| {"ipv4 valid", "192.168.100.10", "true"}, | |
| {"ipv4 host CIDR not a bare ip", "192.168.100.10/24", ""}, | |
| {"ipv4 octet > 255", "10.0.0.300", ""}, | |
| {"ipv4 with junk suffix", "192.168.100.10x", ""}, | |
| {"ipv6 valid", "2001:db8::1", "true"}, | |
| {"ipv6 with zone", "fe80::1%eth0", "true"}, | |
| {"empty returns false", "", ""}, | |
| {"hostname not an ip", "controlplane.example.com", ""}, | |
| } | |
| {"ipv4 valid", "192.168.100.10", "true"}, | |
| {"ipv4 host CIDR not a bare ip", "192.168.100.10/24", "false"}, | |
| {"ipv4 octet > 255", "10.0.0.300", "false"}, | |
| {"ipv4 with junk suffix", "192.168.100.10x", "false"}, | |
| {"ipv6 valid", "2001:db8::1", "true"}, | |
| {"ipv6 with zone", "fe80::1%eth0", "true"}, | |
| {"empty returns false", "", "false"}, | |
| {"hostname not an ip", "controlplane.example.com", "false"}, | |
| } |
There was a problem hiding this comment.
Applied in d814077. ipIsValid now returns Go bool; chart wiring simplifies to if not (ipIsValid .Values.floatingIP). templateBoolTrue is gone (no more literal "true" strings in engine.go). Tests adjusted to expect "false" instead of "" because Helm's renderer stringifies a bool result.
| "<nil>" string, which is neither "" nor in the skip set. | ||
| Real Talos COSI always sets scope, so this is a latent | ||
| guardrail rather than a hot path. */ -}} | ||
| {{- if and .spec.scope (ne (.spec.scope | toString) "") (not (has (.spec.scope | toString) (list "host" "link" "nowhere"))) -}} |
There was a problem hiding this comment.
Applied in d814077. The skip-list {host, link, nowhere} is now hoisted to $skipScopes before the range block in talm.discovered.link_name_for_address, built once per call rather than once per address-table entry.
…p-list
Two improvements from Gemini review:
1. ipIsValid now returns Go bool instead of "true"/"" string.
Callers compose via `if not (ipIsValid x)` rather than
`if ne (ipIsValid x) "true"` — idiomatic Helm predicate
composition. Removes the templateBoolTrue constant entirely
since "true" no longer appears as a literal in engine.go;
goconst is satisfied because the predicate's bool return
stringifies to "true"/"false" only at render-output time, not
at funcMap registration.
2. Hoist the skip-list literal {host, link, nowhere} out of the
range body in talm.discovered.link_name_for_address. The list
was rebuilt on every address-table entry; one build per call
is enough.
No behavior change. Both Gemini suggestions are pure clarity /
micro-perf improvements; chart-side wiring in cozystack and
generic charts simplified accordingly. Unit-test expectations
updated to match the bool stringification ("false" instead of
"").
Signed-off-by: Aleksei Sviridkin <f@lex.la>
vipLink docstring referenced talm.discovered.configurable_link_names
by name. That is a chart-internal template define name; operators
reading values.yaml should not need to know template-defines to
understand selection rules. Replace with "the {physical NIC, bond,
vlan, bridge} set above" — the same set already enumerated in step
1 a few lines above in the same block. No behavior change.
Signed-off-by: Aleksei Sviridkin <f@lex.la>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
README.md (1)
64-64: ⚡ Quick winRefactor the vipLink documentation for readability.
Line 64 is an ~850-character single-sentence paragraph with multiple nested parenthetical clauses. While the content is technically accurate and comprehensive, the structure makes it hard for operators to parse on first read. Critical information—when to override
vipLink, what links are skipped, operator responsibilities—is buried in nested clauses.📖 Suggested refactor: break into bullets or short paragraphs
Consider restructuring as:
Edit `values.yaml` to set your cluster's control-plane endpoint. This is the URL every node's kubelet and kube-proxy will dial. The chart leaves it empty on purpose so a missed override fails loudly instead of silently embedding a placeholder. **Endpoint and VIP setup:** - For cozystack VIP setups: set `endpoint` and `floatingIP` together (same IP, single shared VIP). - For single-node clusters: use that node's routable IP as `endpoint`; leave `floatingIP` blank. - For multi-node with an external load balancer: use the LB URL as `endpoint`; leave `floatingIP` blank. **Automatic `vipLink` selection (when `vipLink` is empty):** The chart picks the link automatically using a two-step rule: 1. **Longest-prefix match across configurable links:** If the `floatingIP` falls inside the CIDR of any address on a configurable link (physical NIC, bond, VLAN, bridge), the most specific subnet wins. This handles the Hetzner-style topology where a public NIC carries the default route and a VLAN child carries the private cluster subnet—the VIP lands on the VLAN child. 2. **Fallback to IPv4-default-gateway-bearing link:** For upstream-routable VIPs that arrive via the default route. Addresses on links the chart does not emit a per-link document for (Wireguard, kernel-managed loopback, slave NICs of a bond, anything outside the configurable set) are skipped—a VIP pinned there would have no surrounding network document. **Explicit `vipLink` override:** Set `vipLink` explicitly when the target link does not yet exist on the live system at first apply (typically a VLAN sub-interface). The chart pins `Layer2VIPConfig.link` to it directly and emits the document even on a totally fresh node where discovery has not yet populated the addresses table. The chart does not auto-emit a `LinkConfig` or `VLANConfig` for the override link; the operator is responsible for ensuring the link comes up, typically by adding a `LinkConfig` or `VLANConfig` for that link to the per-node body overlay alongside `vipLink`. **Subnet selectors:** Subnet-selector fields (`kubelet.validSubnets`, `etcd.advertisedSubnets`) are derived automatically from the node's default-gateway-bearing link, so no override is needed unless you have a multi-homed node that requires a specific subnet pinned.This preserves all the information but chunks it into scannable sections with visual hierarchy.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` at line 64, The long single-paragraph `vipLink` documentation in README.md is hard to parse; split it into short paragraphs and/or bullet sections covering (1) endpoint and VIP setup rules, (2) automatic `vipLink` selection (two-step rule: longest-prefix match across configurable links, then fallback to IPv4-default-gateway-bearing link), (3) links that are skipped (Wireguard, loopback, slave NICs, etc.), (4) explicit `vipLink` override behavior (mention that `Layer2VIPConfig.link` is pinned and a per-node `LinkConfig`/`VLANConfig` must be provided by the operator), and (5) subnet-selector behavior (`kubelet.validSubnets`, `etcd.advertisedSubnets` derived from default-gateway-bearing link); preserve all original details but reflow into headings/bullets for readability.pkg/engine/contract_network_multidoc_test.go (1)
414-414: 💤 Low valueMinor: Assertion 4 checks
"name: enp0s31f6\n"with trailing newline.Line 414 searches for
"name: enp0s31f6\n"with an explicit newline, while other name assertions (lines 393, 441) omit it. The trailing\ntightens the match (avoids partial hits likename: enp0s31f6.4000), which is good for this specific check, but inconsistent with the pattern elsewhere in the test.🎨 Optional: normalize to match the pattern on lines 393/441
If you want consistency, either drop the
\nhere or add it to lines 393/441. The current form works correctly, so this is purely style.- assertContains(t, out, "name: enp0s31f6\n") + assertContains(t, out, "name: enp0s31f6")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/engine/contract_network_multidoc_test.go` at line 414, Assertion uses a trailing newline in assertContains(t, out, "name: enp0s31f6\n") which is inconsistent with the other name assertions that omit the newline; update the test to be consistent by removing the trailing "\n" from this assertContains call (or alternatively add "\n" to the other name assertions) so all name checks follow the same pattern while preserving the specificity for "name: enp0s31f6".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@charts/cozystack/templates/_helpers.tpl`:
- Around line 378-389: The template falls back to $defaultLinkName even if that
link is unmanaged/unaligned with the renderable-link filter used by include
"talm.discovered.link_name_for_address", which can emit Layer2VIPConfig for
links like WireGuard/CNI; change the logic in the block that sets $vipLink
(around the include "talm.discovered.link_name_for_address" and
$defaultLinkName) to validate that the chosen fallback is also a
renderable/managed link (call the same discovery/validation helper or add an
explicit test) and only emit the Layer2VIPConfig when $vipLink passes that
renderable/managed check (preserving the existing guards .Values.floatingIP,
.Values.vipLink and .MachineType == "controlplane").
In `@charts/generic/templates/_helpers.tpl`:
- Around line 302-313: The Layer2VIPConfig block can assign the fallback
$defaultLinkName even when that link is non-renderable; update the fallback so
you only assign $vipLink = $defaultLinkName when that default link is itself
renderable by invoking the existing renderable-link guard (e.g. call the helper
that checks a link is renderable — include
"talm.discovered.link_name_is_renderable" or the equivalent helper used
elsewhere) before setting $vipLink, so the template (the block using
.Values.floatingIP, .Values.vipLink, .MachineType, include
"talm.discovered.link_name_for_address", $vipLink and $defaultLinkName) only
emits Layer2VIPConfig for links the chart will actually render.
In `@charts/talm/templates/_helpers.tpl`:
- Around line 336-379: The helper currently allows links with scope "bridge" to
be considered; update the scope skip-list so bridges are excluded by adding
"bridge" to the $skipScopes list (change $skipScopes := list "host" "link"
"nowhere" to include "bridge") in the helper that iterates addresses (the block
initializing $skipScopes and using has (.spec.scope | toString) $skipScopes), so
cidrContains/$best selection logic never chooses a bridge until the chart emits
BridgeConfig.
---
Nitpick comments:
In `@pkg/engine/contract_network_multidoc_test.go`:
- Line 414: Assertion uses a trailing newline in assertContains(t, out, "name:
enp0s31f6\n") which is inconsistent with the other name assertions that omit the
newline; update the test to be consistent by removing the trailing "\n" from
this assertContains call (or alternatively add "\n" to the other name
assertions) so all name checks follow the same pattern while preserving the
specificity for "name: enp0s31f6".
In `@README.md`:
- Line 64: The long single-paragraph `vipLink` documentation in README.md is
hard to parse; split it into short paragraphs and/or bullet sections covering
(1) endpoint and VIP setup rules, (2) automatic `vipLink` selection (two-step
rule: longest-prefix match across configurable links, then fallback to
IPv4-default-gateway-bearing link), (3) links that are skipped (Wireguard,
loopback, slave NICs, etc.), (4) explicit `vipLink` override behavior (mention
that `Layer2VIPConfig.link` is pinned and a per-node `LinkConfig`/`VLANConfig`
must be provided by the operator), and (5) subnet-selector behavior
(`kubelet.validSubnets`, `etcd.advertisedSubnets` derived from
default-gateway-bearing link); preserve all original details but reflow into
headings/bullets for readability.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 43864524-276a-4019-b0e8-20c4126126f3
📒 Files selected for processing (10)
README.mdcharts/cozystack/templates/_helpers.tplcharts/cozystack/values.yamlcharts/generic/templates/_helpers.tplcharts/generic/values.yamlcharts/talm/templates/_helpers.tplpkg/engine/contract_network_multidoc_test.gopkg/engine/helm/engine.gopkg/engine/helm/engine_test.gopkg/engine/render_test.go
…igurable set
Three CodeRabbit findings on the latest review, all of the same
class: VIP-link selection paths can pick links the chart does not
emit a per-link document for, leaving Layer2VIPConfig dangling.
Resolved by implementing BridgeConfig emission properly rather
than excluding bridges as a workaround:
1. bridge_slaves discovery helper: walks lookup "links" by
spec.slaveKind=="bridge" + spec.masterIndex, symmetric to
bond_slaves.
2. cozystack and generic charts: replace the prior "bridge as
gateway = fail, non-gateway bridge = silently skipped" branch
with a typed BridgeConfig emission carrying links (ports),
optional stp / vlan blocks from spec.bridgeMaster, plus the
shared addresses / routes / mtu shape. Bridges now behave like
bonds at render time.
3. Both charts: guard the default-route-link fallback in the
discovery-derived Layer2VIPConfig block on the same
configurable_link_names set the matched-link path uses.
Previously the fallback would happily promote a Wireguard /
slave / other unmanaged link if that was where the IPv4
default route sat.
Tests:
- TestContract_NetworkMultidoc_BridgeConfigEmitted: non-gateway
bridge yields a BridgeConfig document.
- TestMultiDocEmitsBridgeConfigWhenBridgeCarriesDefaultRoute:
gateway bridge now emits BridgeConfig + routes.gateway (was
a hard fail before).
- TestContract_Errors_MultidocBridgeAsGateway_NoLongerFails:
pins the negation of the prior fail-fast.
- TestContract_NetworkMultidoc_VIPOnBridge: VIP inside a bridge
subnet pins to that bridge (BridgeConfig + Layer2VIPConfig
co-emit, STP + ports verified).
- TestContract_NetworkMultidoc_VIPSkipsNonConfigurableDefaultRouteLink
plus generic mirror: default-route fallback honours the
configurable gate, no Layer2VIPConfig when the only resolvable
link is Wireguard.
The unimplemented-BridgeConfig migration hint at engine.go:181 and
its generic mirror are now obsolete — the gateway-bridge fail
above them no longer fires, the typed branch handles both
gateway and non-gateway cases uniformly.
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…E bridge prose
Two blockers from review:
1. ipIsValid is a Go-template function with a string parameter,
and Go text/template raises "wrong type for value; expected
string; got int" if the operator writes an unquoted numeric
YAML scalar (floatingIP: 192168). The entire fail-fast
validation block exists to surface bad values clearly at
render time; passing the int through unguarded defeats the
point. Coerce through `toString` BEFORE the predicate in both
cozystack and generic charts. The same trap is already flagged
in CLAUDE.md ("Helm/Go template numeric scalar").
Pinned by
TestContract_NetworkMultidoc_VIPFailsOnNumericFloatingIP: the
numeric input must produce the friendly fail and must NOT
surface the Go-template type-mismatch panic.
2. README.md line 129 still described the pre-PR bridge
behavior ("not auto-emitted", "non-gateway bridge skipped",
"gateway bridge fails the render"). After this PR every
clause is false — BridgeConfig is emitted symmetrically to
BondConfig. Rewrite the sentence to match the actual current
behavior (ports via spec.slaveKind, STP / vlan from
spec.bridgeMaster, gateway-bearing bridges get
routes.gateway).
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…ge docs
Two more correctness gaps + comment drift from review:
1. BridgeConfig vlan.filtering was dead. The chart read
$bridgeMaster.vlan.filtering, but COSI's BridgeVLANSpec
serialises FilteringEnabled as yaml:"filteringEnabled"
(verified against siderolabs/talos
pkg/machinery/resources/network/link.go:280). The hasKey
check was always false, so VLAN filtering on a discovered
bridge never reached the rendered BridgeConfig regardless
of kernel state.
Output-side BridgeConfig schema still uses the shorter
yaml:"filtering,omitempty" key, so we read filteringEnabled
from discovery and emit filtering to the rendered document.
Pinned by an addition to TestContract_NetworkMultidoc_VIPOnBridge:
bridgeWithClusterSubnetLookup now carries
spec.bridgeMaster.vlan.filteringEnabled: true, and the test
asserts `filtering: true` reaches the BridgeConfig output.
2. Stale comments across render_test.go pretending bridge
support is unimplemented. Touch-ups:
- TestMultiDocBridgeSkipsLinkConfigBranch renamed to
TestMultiDocBridgeDoesNotFallThroughToLinkConfig; the
test still pins the right contract (no LinkConfig for
bridges) but its rationale no longer mentions "until
BridgeConfig support lands".
- bridgeLookup docstring rewritten to acknowledge
BridgeConfig emission for the no-address bridge case.
- bridgeWithGatewayLookup docstring rewritten — the
renderer no longer fails for gateway-bridges.
- cozystack helpers.tpl drops "CNI bridge before
BridgeConfig support" from the default-route fallback
comment (BridgeConfig support IS in this PR; bridges
are now first-class configurable links).
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…p/vlan independence Two more correctness gaps from review: 1. addresses_by_link propagated corrupt entries verbatim into LinkConfig / VLANConfig / BridgeConfig `addresses` blocks. link_name_for_address was deliberately lenient on cidrContains parse failures so a single bad COSI entry could not crash the render, but the same lenience did not extend to the address-emission path: a "definitely-not-a-cidr" entry would surface verbatim into the rendered YAML and fail on apply with a less-informative error. Filter on cidrPrefixLen >= 0 — symmetric to the same parse helper link_name_for_address uses — so malformed entries are dropped at the same boundary. Also hoist the scope skip-list out of the iteration body to match link_name_for_address. Pinned by TestContract_NetworkMultidoc_LinkAddressesFilterMalformedCidr: the malformedAddressEntryLookup fixture's "definitely-not-a-cidr" entry must NOT reach any `- address:` line of the rendered output, and the valid sibling 192.168.100.4/24 MUST still be present. 2. BridgeConfig.stp / BridgeConfig.vlan independence was only pinned for the both-set case; a refactor that accidentally nested one inside the other (e.g. de-duplicating the wrapping `if` checks) would silently regress one path. Add two single-block fixtures (bridgeWithVLANOnlyLookup, bridgeWithSTPOnlyLookup) sharing a common bridgeOnlyLookup helper, plus contract tests asserting each sub-block emits in isolation AND the absent sub-block does NOT emit. Signed-off-by: Aleksei Sviridkin <f@lex.la>
…ment fragment Two findings from review: 1. nil .Values.floatingIP (operator unset, Helm-coalesced value-table miss, --set floatingIP=null) tripped the fail-fast block. Sprig's `nil | toString` returns the literal string "<nil>", which is truthy AND not a valid IP — so feeding the toString'd value through ipIsValid below would fail-fast on every controlplane render where the operator legitimately left floatingIP unset (single-node clusters, LB-fronted multi-node, anything Helm strips out of the values table). Gate on the RAW .Values.floatingIP truthiness first; coerce through toString INSIDE the gated body, where the numeric-scalar safety net still fires. Both charts. Pinned by TestContract_NetworkMultidoc_VIPGracefulWhenFloatingIPNil plus its Generic_ mirror: nil floatingIP must render cleanly with no Layer2VIPConfig and no "<nil>" leakage in the rendered YAML. 2. One inline test comment contained the Russian verb "pin'ит" mid-sentence. Replace with the English "pins" — committed-text English-only rule per public-content convention. Signed-off-by: Aleksei Sviridkin <f@lex.la>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
pkg/engine/render_test.go (2)
6413-6417: ⚡ Quick winKeep the error-render helper defaults aligned with the main helper.
renderCozystackExpectErrorinjectsendpointbut not theadvertisedSubnetsfallback thatrenderCozystackWithapplies. That makes future error-path tests prone to tripping the unrelated advertised-subnets/default-route guard instead of the validation they intended to pin.Suggested alignment
values := cloneValues(chrt.Values) if v, _ := values["endpoint"].(string); v == "" { values["endpoint"] = testEndpoint } + if arr, ok := values["advertisedSubnets"].([]any); !ok || len(arr) == 0 { + values["advertisedSubnets"] = []any{testAdvertisedSubnet} + } maps.Copy(values, overrides)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/engine/render_test.go` around lines 6413 - 6417, The error-render helper renderCozystackExpectError currently injects the endpoint default but omits the advertisedSubnets fallback used by renderCozystackWith, which lets unrelated advertised-subnets/default-route guards trip tests; update renderCozystackExpectError (the code that clones chrt.Values via cloneValues and sets values["endpoint"] = testEndpoint) to also set values["advertisedSubnets"] to the same default value used in renderCozystackWith when that key is empty/missing (and then apply maps.Copy(values, overrides) as before) so both helpers have the same defaults.
1746-1747: ⚡ Quick winAssert the concrete bridge gateway value.
Checking only for
gateway:can still pass if a future change emits the wrong value or puts the gateway on the wrong document. This regression test is stronger if it pins the expected route value from the fixture.Suggested tightening
- if !strings.Contains(output, "gateway:") { + if !strings.Contains(output, "gateway: 192.168.201.1") { t.Errorf("BridgeConfig for the IPv4-default-route bridge must carry the gateway entry, got:\n%s", output) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/engine/render_test.go` around lines 1746 - 1747, Replace the weak substring assertion that only checks for "gateway:" with an exact assertion that the IPv4-default-route BridgeConfig contains the concrete gateway value from the test fixture: instead of if !strings.Contains(output, "gateway:") use a comparison like if !strings.Contains(output, "gateway: <expected-value>") or assert equality against the rendered BridgeConfig fragment; reference the test variables (output) and the BridgeConfig for the "IPv4-default-route" bridge and obtain the expected gateway string from the fixture or a test constant to pin the exact value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@charts/cozystack/templates/_helpers.tpl`:
- Around line 227-232: The guard currently checks .Values.floatingIP for
truthiness so falsy but user-set values like 0 or false are treated as "unset"
and skip ipIsValid; change the conditional in the controlplane block to
explicitly test for presence/non-empty-string before converting to string (e.g.,
check that .Values.floatingIP is not nil and its stringified length > 0) so that
the ipIsValid($fipStr) validation always runs for user-supplied values; update
the block surrounding .Values.floatingIP (used with ipIsValid and the fail
message) to perform this explicit presence check so invalid falsy inputs surface
a fail and the Layer2VIPConfig emission logic is not accidentally skipped.
In `@README.md`:
- Line 64: Split the long README paragraph into clear sections: add a short
intro describing how to set endpoint and floatingIP in values.yaml (mention
single-node vs cozystack VIP vs external LB cases), then a bulleted "Automatic
vipLink selection" section explaining the two-step rule (longest-prefix-match
across configurable links, otherwise fallback to IPv4-default-gateway-bearing
link) including the Hetzner VLAN example, a separate "When to set vipLink
explicitly" callout that references Layer2VIPConfig.link and explains that the
chart will emit the per-link document only if the link exists (and that
operators should add a LinkConfig or VLANConfig in the per-node overlay for VLAN
sub-interfaces), and a final short "Subnet selectors" note stating
kubelet.validSubnets and etcd.advertisedSubnets are derived from the node's
default-gateway-bearing link; ensure each section uses the exact symbols
endpoint, floatingIP, vipLink, Layer2VIPConfig.link, LinkConfig, VLANConfig,
kubelet.validSubnets, and etcd.advertisedSubnets so users can scan for those
terms.
- Line 129: The Note is too dense—split the long paragraph into a scannable
bulleted list: add a short lead sentence about Talos version behavior (single
YAML for <v1.12, multi-document for >=v1.12), then one bullet per document type
(HostnameConfig, ResolverConfig, LinkConfig/BondConfig/VLANConfig, BridgeConfig,
Layer2VIPConfig, RegistryMirrorConfig) stating when it is emitted and the key
behaviors to look for (e.g., per-NIC one document, bond slaves filtered, gateway
present only on link/bridge carrying IPv4 default route, IPv4/IPv6 addresses
surfaced, floatingIP stripped from per-link addresses). Ensure bullets mention
LinkConfig vs BondConfig vs VLANConfig mapping, bridge detection via
spec.slaveKind=="bridge" + spec.masterIndex and spec.bridgeMaster for STP/VLAN
settings, and that Layer2VIPConfig appears on controlplane nodes when floatingIP
is set.
---
Nitpick comments:
In `@pkg/engine/render_test.go`:
- Around line 6413-6417: The error-render helper renderCozystackExpectError
currently injects the endpoint default but omits the advertisedSubnets fallback
used by renderCozystackWith, which lets unrelated
advertised-subnets/default-route guards trip tests; update
renderCozystackExpectError (the code that clones chrt.Values via cloneValues and
sets values["endpoint"] = testEndpoint) to also set values["advertisedSubnets"]
to the same default value used in renderCozystackWith when that key is
empty/missing (and then apply maps.Copy(values, overrides) as before) so both
helpers have the same defaults.
- Around line 1746-1747: Replace the weak substring assertion that only checks
for "gateway:" with an exact assertion that the IPv4-default-route BridgeConfig
contains the concrete gateway value from the test fixture: instead of if
!strings.Contains(output, "gateway:") use a comparison like if
!strings.Contains(output, "gateway: <expected-value>") or assert equality
against the rendered BridgeConfig fragment; reference the test variables
(output) and the BridgeConfig for the "IPv4-default-route" bridge and obtain the
expected gateway string from the fixture or a test constant to pin the exact
value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f4ab1416-30ba-4234-ac15-cc907625951a
📒 Files selected for processing (7)
README.mdcharts/cozystack/templates/_helpers.tplcharts/generic/templates/_helpers.tplcharts/talm/templates/_helpers.tplpkg/engine/contract_errors_test.gopkg/engine/contract_network_multidoc_test.gopkg/engine/render_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- charts/generic/templates/_helpers.tpl
…-break; restructure README
Three findings from review, all in-scope:
1. floatingIP truthy gate vs falsy non-string values (CR).
The previous `if and .Values.floatingIP ...` raw-truthy check
skipped the ipIsValid fail-fast when the operator wrote
`floatingIP: 0` or `floatingIP: false` — the value is falsy
on the raw side but represents a "set but invalid" intent.
Hoist a single $fipStr (toString) at the top of the multi-
doc body and a derived $fipIsSet predicate that's true when
$fipStr is non-empty AND not "<nil>" (the Sprig serialisation
of nil). Numeric 0, bool false, and any other shape now
stringify and reach ipIsValid; nil and unset still skip.
2. .Values.floatingIP unstringified in addresses_by_link strip
(branch-review).
The per-link strip used `printf "%s/" $.Values.floatingIP`
directly. A numeric YAML scalar would emit `%!s(int=N)/`
that never matches a real CIDR — the strip silently dies
on worker renders. Reuse the hoisted $fipStr / $fipIsSet so
strip and validation honour the same coerced value.
Both charts (cozystack + generic) updated symmetrically.
3. Tests for Filter 2 (nil-scope) and tie-break behaviour.
/branch-review previously flagged the nil-scope guardrail
as latent without a pin and the longest-prefix tie-break as
documented-but-untested. Adds:
- hetznerWithNilScopeAddressLookup + scope-filter test:
a nil-scope CIDR on a configurable link must NOT leak into
rendered LinkConfig.addresses.
- twoConfigurableLinksSamePrefixLookup + tie-break test:
when two equal-prefix links both contain the floatingIP,
iteration order picks the first match (strict-gt
comparator).
4. README readability (CR).
Two ~500-word paragraphs (vipLink/floatingIP rules at L64
and multi-doc format Note at L129) restructured into bullet
lists per the readability flag. Same content, scannable
shape: endpoint/floatingIP combinations as a list, two-step
selection rule numbered, multi-doc emission rules grouped
by document kind.
Signed-off-by: Aleksei Sviridkin <f@lex.la>
/branch-review flagged "partial symmetry suggests a contract that doesn't actually exist": the cozystack and generic charts share the multi-doc body byte-for-byte, but the new contracts added in this branch (Filter-2 nil-scope, longest-prefix tie-break, malformed-CIDR filter, BridgeConfig stp/vlan sub-block independence) only ran against the cozystack chart. Other contracts in the same branch (Hetzner v4/v6, longest-prefix, fallback, nil floatingIP) have explicit Generic_ mirrors, so the asymmetry was noticeable. Add the six missing Generic_ mirrors so a regression in either chart's hunk surfaces immediately, matching the established mirror pattern. Signed-off-by: Aleksei Sviridkin <f@lex.la>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
pkg/engine/contract_network_multidoc_test.go (1)
986-1009: ⚡ Quick winTighten generic mirror parity for singleton VIP emission
The generic mirror tests at Line 986+ and Line 1001+ don’t consistently assert that only one
Layer2VIPConfigis emitted. This can let duplicate-emission regressions pass in generic while cozystack is protected.Suggested patch
func TestContract_NetworkMultidoc_Generic_HetznerTopology_VIPOnPrivateVLAN(t *testing.T) { @@ if strings.Contains(out, "link: enp0s31f6\n") { t.Errorf("generic chart: Layer2VIPConfig points at the public NIC instead of the VLAN child:\n%s", out) } + if got := strings.Count(out, "kind: Layer2VIPConfig"); got != 1 { + t.Errorf("generic chart: expected exactly 1 Layer2VIPConfig, got %d:\n%s", got, out) + } } @@ func TestContract_NetworkMultidoc_Generic_HetznerTopology_IPv6VIPOnPrivateVLAN(t *testing.T) { @@ assertContains(t, out, "kind: Layer2VIPConfig") assertContains(t, out, `name: "2001:db8:cafe::10"`) assertContains(t, out, "link: enp0s31f6.4000") + if strings.Contains(out, "link: enp0s31f6\n") { + t.Errorf("generic chart: Layer2VIPConfig points at the public NIC instead of the VLAN child:\n%s", out) + } + if got := strings.Count(out, "kind: Layer2VIPConfig"); got != 1 { + t.Errorf("generic chart: expected exactly 1 Layer2VIPConfig, got %d:\n%s", got, out) + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/engine/contract_network_multidoc_test.go` around lines 986 - 1009, Both generic mirror tests can miss duplicate Layer2VIPConfig emissions; update TestContract_NetworkMultidoc_Generic_HetznerTopology_VIPOnPrivateVLAN and TestContract_NetworkMultidoc_Generic_HetznerTopology_IPv6VIPOnPrivateVLAN to assert exactly one Layer2VIPConfig is present by checking strings.Count(out, "kind: Layer2VIPConfig") == 1 and failing the test with a clear message if the count != 1 (do this in each test right after the existing assertContains for the Layer2VIPConfig).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/engine/contract_network_multidoc_test.go`:
- Around line 274-301: In TestContract_NetworkMultidoc_VIPOnBridge add
assertions to guard against duplicate or mis-linked VIPs: after the existing
Layer2VIPConfig checks, assert there is exactly one "kind: Layer2VIPConfig"
emitted (fail if more than one occurrence) and assert the Layer2VIPConfig
references the correct link "link: br0" and does not contain any alternative
link (e.g., "link: eth0" or other). Update the test function
TestContract_NetworkMultidoc_VIPOnBridge to include these singleton and
wrong-link assertions immediately after the current VIP/link assertions so the
test fails if a second Layer2VIPConfig is produced or the VIP is attached to the
wrong interface.
In `@pkg/engine/render_test.go`:
- Around line 1740-1748: The test currently checks for any BridgeConfig and any
gateway: anywhere in output; change it to split the rendered YAML output into
documents and locate the document whose "name: br0" is present, then assert that
that specific document has "kind: BridgeConfig" and contains "gateway:";
additionally assert there is no separate document with "name: br0" and "kind:
LinkConfig" (i.e., reject a LinkConfig for br0). Use the existing output
variable and document-splitting (e.g., on "\n---") or YAML-unmarshal into maps
to find documents by name and verify kind and gateway only within the br0
document.
- Around line 6632-6636: The helper that clones chart values (using cloneValues
and the local values map) must preload advertisedSubnets to avoid triggering
empty-discovery required() guards; set values["advertisedSubnets"] =
testAdvertisedSubnet in the same helper (alongside the existing
values["endpoint"] = testEndpoint logic) before calling maps.Copy(values,
overrides) so callers can still override it when needed; mirror the behavior
from renderChartTemplateWithLookup by seeding advertisedSubnets with
testAdvertisedSubnet here.
---
Nitpick comments:
In `@pkg/engine/contract_network_multidoc_test.go`:
- Around line 986-1009: Both generic mirror tests can miss duplicate
Layer2VIPConfig emissions; update
TestContract_NetworkMultidoc_Generic_HetznerTopology_VIPOnPrivateVLAN and
TestContract_NetworkMultidoc_Generic_HetznerTopology_IPv6VIPOnPrivateVLAN to
assert exactly one Layer2VIPConfig is present by checking strings.Count(out,
"kind: Layer2VIPConfig") == 1 and failing the test with a clear message if the
count != 1 (do this in each test right after the existing assertContains for the
Layer2VIPConfig).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2d0907c7-8348-45d5-abfc-7d5e36f0e2df
📒 Files selected for processing (5)
README.mdcharts/cozystack/templates/_helpers.tplcharts/generic/templates/_helpers.tplpkg/engine/contract_network_multidoc_test.gopkg/engine/render_test.go
✅ Files skipped from review due to trivial changes (1)
- README.md
🚧 Files skipped from review as they are similar to previous changes (2)
- charts/generic/templates/_helpers.tpl
- charts/cozystack/templates/_helpers.tpl
Address review feedback from coderabbitai on pkg/engine/contract_network_multidoc_test.go:301: TestContract_NetworkMultidoc_VIPOnBridge proved the expected VIP/link pair existed but did not assert against (a) a second Layer2VIPConfig emitted on a wrong link or (b) the VIP landing on the bridge port (eth0) instead of the bridge itself. Both VIP tests in the same file already carry these guards; align the bridge case. Signed-off-by: Aleksei Sviridkin <f@lex.la>
…ubnets in error helper Address two review findings from coderabbitai: 1. render_test.go:1748 — TestMultiDocEmitsBridgeConfigWhenBridgeCarriesDefaultRoute previously asserted `kind: BridgeConfig` and `gateway:` anywhere in the render. Would have passed if br0 had been emitted as both BridgeConfig AND a stray LinkConfig, or if the gateway had landed on the wrong document. Switch to a per-document assertion: find the br0 document, verify it is the BridgeConfig kind (no LinkConfig with the same name), and that the routes.gateway entry sits on br0 itself. 2. render_test.go:6636 — renderCozystackExpectError did not seed advertisedSubnets the way renderChartTemplateWithLookup does, so error tests targeting deeper validations (e.g. malformed floatingIP) could trip the empty-discovery required() guard on advertisedSubnets first and surface the wrong error. Seed testAdvertisedSubnet by default; callers that want to exercise the required() guard override explicitly. Signed-off-by: Aleksei Sviridkin <f@lex.la>
… v1.11 path Closes #164. The v1.12 fail-fast block validated floatingIP at render time, but the v1.11 legacy define still consumed .Values.floatingIP raw — operators on Talos < v1.12 hit the same numeric-YAML-scalar trap and the same apply-time-rejection-instead-of-render-time path that the multi-doc rewrite was written to prevent. Lift the validation into talm.validate_floatingIP, a shared partial in the talm library chart. Both talos.config.network.legacy and talos.config.network.multidoc include it so fail-fast on a malformed value happens regardless of the rendered Talos version. Coerce floatingIP through toString at the top of each legacy define (same $fipStr / $fipIsSet hoist the multi-doc path already does), then replace every direct .Values.floatingIP reference inside the define with the coerced variables — so an unquoted numeric YAML scalar no longer reaches the inline vip.ip: field as `192168` and surface only at apply time. talm.discovered.default_addresses_by_gateway also stringifies floatingIP before the prefix compare so a numeric scalar does not emit `%!s(int=192168)/` that never matches a real CIDR. Tests: 7 new contract pins under TestContract_NetworkLegacy_* covering malformed string, numeric scalar, nil-graceful, falsy non-string (0, false) on cozystack + three Generic_ mirrors. Signed-off-by: Aleksei Sviridkin <f@lex.la>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@charts/talm/templates/_helpers.tpl`:
- Around line 145-149: When iterating addresses in the template range that
matches .spec.linkName/$linkName and .spec.family/$family (the block using
lookup and hasPrefix with $fipStr), add the same CIDR validation guard used
elsewhere: compute or obtain cidrPrefixLen for .spec.address and only append to
$addresses (append $addresses .spec.address) when cidrPrefixLen >= 0, so
malformed or future-format addresses are filtered on this legacy/default-route
path as well.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1e7d8862-1ec3-489d-bfea-f46298f7622b
📒 Files selected for processing (5)
charts/cozystack/templates/_helpers.tplcharts/generic/templates/_helpers.tplcharts/talm/templates/_helpers.tplpkg/engine/contract_network_multidoc_test.gopkg/engine/render_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
- charts/cozystack/templates/_helpers.tpl
- charts/generic/templates/_helpers.tpl
- pkg/engine/render_test.go
- pkg/engine/contract_network_multidoc_test.go
Address review feedback from coderabbitai on charts/talm/templates/_helpers.tpl:149: addresses_by_link gained a `cidrPrefixLen >= 0` guard to keep corrupt or future-format .spec.address entries out of LinkConfig/VLANConfig/BridgeConfig emission, but default_addresses_by_gateway (the legacy v1.11 counterpart) still appended raw .spec.address. Apply the same filter so the legacy machine.network.interfaces[].addresses block does not leak a malformed CIDR either. Pinned by TestContract_NetworkLegacy_DefaultAddressesFilterMalformedCidr. Signed-off-by: Aleksei Sviridkin <f@lex.la>
Closes #164.
What
The v1.12 multi-doc renderer hardcoded
Layer2VIPConfig.linkto the IPv4-default-route link, which puts the VIP on the wrong NIC whenever thefloatingIPlives on a private subnet hosted by a non-default-route link — typical Hetzner-style topology with a public NIC carrying the default route and a VLAN child carrying the private cluster network.Symptom (Hetzner topology,
talosVersion: v1.12, controlplane node):enp0s31f6— public NIC, IPv4 default route0.0.0.0/0via88.99.210.1enp0s31f6.4000— VLAN sub-interface, private cluster address192.168.100.4/24, no default routefloatingIP: 192.168.100.10— must land on the VLAN child where the cluster subnet livesPre-fix output:
Talos installed the VIP on the public NIC; the cluster never saw the leader on the private subnet.
How
Longest-prefix match across configurable links. A new
talm.discovered.link_name_for_addresshelper iterates theaddressesCOSI resource, applies four filters in order — configurable-link gate, scope (skiphost/link/nowhere), CIDR membership via the newcidrContainsengine helper, longest-prefix tie-break viacidrPrefixLen— and returns the link with the most-specific match. Iteration order no longer decides the winner.Default-gateway-link fallback, also configurable-gated. When step 1 finds no match, the chart falls back to
default_link_name_by_gateway, but only if that link is itself in the configurable set. A Wireguard / slave / kernel-managed default-route NIC cannot steal the VIP.BridgeConfig emission. The chart now renders
kind: BridgeConfigfor discovered bridges, symmetric toBondConfig. Bridge ports discovered viaspec.slaveKind == "bridge"+spec.masterIndex(newtalm.discovered.bridge_slaveshelper). STP and VLAN filtering surface when the bridge controller reports them onspec.bridgeMaster. The previous "bridge as gateway = fail-fast" branch is removed; both gateway and non-gateway bridges now produce a typed document. Bridges are first-class configurable links, so they can legitimately host VIPs.Fail-fast on malformed
floatingIP. Render-time validation via the newipIsValidengine helper. Numeric YAML scalars (floatingIP: 192168), nil/unset values (Helm-coalesced from the values table), and string typos (10.0.0.300) are all handled: nil short-circuits before the predicate, integers and other shapes coerce throughtoString, and only then go throughipIsValidfor the friendly fail with the offending value.Malformed-CIDR filter in
addresses_by_link. A corrupt or future-format.spec.addressno longer leaks intoLinkConfig/VLANConfig/BridgeConfigaddressesblocks.cidrPrefixLen >= 0gates emission at the same pointlink_name_for_addressfilters its own iteration.Lenient parse semantics.
cidrContainsandcidrPrefixLenreturn(false, nil)/(-1, nil)on parse failure rather than an error. A single corrupt entry in the COSI addresses table cannot crash the entire chart render.Cross-version validation
The
floatingIPfail-fast block landed in the v1.12 multi-doc define is also wired into the v1.11 legacy define via a sharedtalm.validate_floatingIPpartial in the talm library chart. So a malformedfloatingIP(string typo, numeric YAML scalar, falsy non-string0/false) fails at render time regardless of the rendered Talos version, andtalm.discovered.default_addresses_by_gatewaynowtoStringsfloatingIPbefore its prefix compare so a numeric scalar no longer emits%!s(int=192168)/. Closes #164.New engine helpers
Four template functions in
pkg/engine/helm/engine.go, all wrappingnet/netipso chart templates handle IPv4 and IPv6 uniformly without per-family bit math:cidrNetwork(cidr) (string, error)— pre-existing, hoisted out of the inline closure.cidrContains(cidr, ip) (bool, error)— set-membership test, lenient on parse failure.cidrPrefixLen(cidr) (int, error)— prefix length, -1 on parse failure (load-bearing sentinel for longest-prefix-match).ipIsValid(addrStr) (bool, error)— predicate for fail-fast at render time.New chart helpers
talm.discovered.link_name_for_address(charts/talm/templates/_helpers.tpl) — main selector.talm.discovered.bridge_slaves— symmetric tobond_slavesfor bridge ports.Verification
go build ./...clean.go test ./... -count=1green across all 8 packages.golangci-lint runclean on host andGOOS=windows.TestContract_NetworkMultidoc_HetznerTopology_VIPOnPrivateVLANfailed onmainbefore the fix, now green.Test coverage
Hetzner v4, Hetzner v6, longest-prefix, longest-prefix tie-break by iteration order, fallback-to-gateway, non-configurable-link skip on both match and fallback paths, link-scoped address skip, nil-scope address skip, no-default-route fresh-boot, vipLink-override-still-validates ordering, falsy non-string
floatingIP(numeric 0, bool false), malformedfloatingIP(string typo, numeric scalar, nil), malformed CIDR survival + filter, malformed CIDR not leaking into emittedaddresses:, BridgeConfig non-gateway, BridgeConfig as gateway, VIP-on-bridge, BridgeConfig STP-only / VLAN-only sub-block independence. Ten*_Generic_*mirror tests pin parity between cozystack and generic charts on every contract.Backward compatibility
Behavioral change for operators whose floatingIP previously landed on the wrong link: it now lands on the right link. No flag flip, no migration step. Operators with
vipLinkalready set explicitly are unaffected. Operators whosefloatingIPdoesn't fall inside any discovered subnet keep the previous default-route-link behavior via the fallback (now configurable-gated).The discovery-derived
Layer2VIPConfigno longer requires a non-empty IPv4 default route to emit — whenlink_name_for_addressresolves a configurable link by subnet match, the chart emits regardless of$defaultLinkName. This unblocks fresh-boot nodes (gateway not yet reachable) and policy-routing-only topologies (no "default" route at all). Pinned byTestContract_NetworkMultidoc_VIPEmitsWithMatchingSubnetEvenWithoutDefaultRoute.Operators previously seeing the "bridge as gateway, BridgeConfig unimplemented" fail-fast: the chart now emits a proper
BridgeConfigdocument. The per-node-body migration path documented in the old error message is no longer needed.Summary by CodeRabbit
New Features
Bug Fixes
Documentation