Skip to content

feat: per-service ingress port filtering#11

Merged
kvaps merged 11 commits into
cozystack:mainfrom
mattia-eleuteri:feat/port-filtering
Apr 27, 2026
Merged

feat: per-service ingress port filtering#11
kvaps merged 11 commits into
cozystack:mainfrom
mattia-eleuteri:feat/port-filtering

Conversation

@mattia-eleuteri
Copy link
Copy Markdown
Contributor

@mattia-eleuteri mattia-eleuteri commented Apr 27, 2026

What this PR does

Adds an opt-in ingress port filter to cozy-proxy, so that a Service annotated with networking.cozystack.io/wholeIP: "false" only accepts TCP/UDP traffic on the ports listed in Service.spec.ports. Traffic to any other port destined to that LB IP is dropped before NAT rewrite. Egress IP preservation is preserved in both modes.

The existing wholeIP semantics (annotation "true" or absent → passthrough) is unchanged, so existing deployments are not affected.

Why

The companion chart vm-instance in cozystack/cozystack ships an externalMethod: PortList mode that's documented as filtering ingress to declared ports. That mode is currently non-functional because cozy-proxy installs an IP-only DNAT (no port awareness) regardless of Service.spec.ports. Verified empirically on Cozystack v1.3.0 — see https://github.com/cozystack/cozystack/issues/ for the user-facing report.

How

  • Datapath redesign — split the rewrite into two chains:
    • egress_snat at priority raw (-300): saddr rewrite via pod_svc map, runs before conntrack so the recorded tuple has saddr=LB_IP.
    • ingress_dnat at priority mangle (-150): daddr rewrite via svc_pod map, runs after conntrack so reply packets of egress flows are matched correctly. This was previously combined with egress_snat in a single chain at raw, which broke conntrack matching for return packets.
  • New port_filter chain at priority filter (0):
    • ct state established,related accept for stateful pass-through of return packets.
    • ip daddr @filtered_pods ip daddr . meta l4proto . th dport != @allowed_ports drop to enforce the port whitelist for new connections.
  • Two new objects in table cozy_proxy:
    • filtered_pods (set of pod IPs, post-DNAT) opted into filtering.
    • allowed_ports (concat set: ipv4_addr . inet_proto . inet_service) listing allowed (podIP, proto, port) tuples.
  • New methods on proxy.ProxyProcessor: EnsurePortFilter(svcIP, podIP, ports), DeletePortFilter(svcIP, podIP), CleanupPortFilters(map[string]PortFilterEntry).
  • Controller helpers: existing hasWholeIPAnnotation is broadened to "any value triggers management"; new wholeIPPassthrough returns false only for "false". Backward-compatible with existing deployments.

Test plan

  • Unit tests for wholeIPPassthrough and hasWholeIPAnnotation (added)
  • Built cozy-proxy:portfilter-dev (linux/amd64) and deployed on a 3-node Talos lab (Cozystack v1.3.0, Cilium 1.19)
  • Patched a Service to wholeIP: "false" with spec.ports: [22]: only port 22 reachable from outside, ports 80/443/8080/9999 filtered (verified with nmap + bash /dev/tcp)
  • WholeIP-annotated Service unchanged: all listening ports reachable
  • Egress IP preservation works in both modes (TCP curl + UDP DNS): VM sources its own LB IP via SNAT, conntrack tracks return packets correctly
  • nft list table ip cozy_proxy confirms the expected ruleset (egress_snat raw, ingress_dnat mangle, port_filter filter with stateful accept + per-port drop)

Release note

[cozy-proxy] Add per-service ingress port filtering via the wholeIP annotation: with `wholeIP: "false"`, cozy-proxy drops ingress traffic to the LB IP on ports not listed in `Service.spec.ports`. The default `wholeIP: "true"` (or absent) keeps the previous all-ports passthrough behavior. The existing IP rewrite chain `early_snat` is split into `egress_snat` (raw) and `ingress_dnat` (mangle) to make the datapath conntrack-aware. Required to enable a working `externalMethod: PortList` in the cozystack `vm-instance` chart.

Summary by CodeRabbit

  • New Features

    • Services now support value-driven networking.cozystack.io/wholeIP annotation modes ("true", "false", or absent)
    • Port-based filtering for services with conntrack-aware traffic handling
  • Documentation

    • Updated documentation describing the wholeIP annotation configuration modes and corresponding datapath behavior

…face methods

Adds three new methods to the ProxyProcessor interface to support per-service
ingress port filtering. The DummyProxyProcessor implementation logs invocations
without performing any datapath changes — production logic lands in the
NFTProxyProcessor in a follow-up commit.

Signed-off-by: mattia-eleuteri <mattia@hidora.io>
The wholeIP annotation now triggers cozy-proxy management for any value
(including "false"). The new wholeIPPassthrough helper interprets the value:
"true" or absent (backward-compat) means forward all ports; "false" enables
per-service port filtering set up via EnsurePortFilter using Service.spec.ports.

cleanupRemovedServices is extended to reconcile port filters at restart so
the live datapath state matches the cluster snapshot.

Signed-off-by: mattia-eleuteri <mattia@hidora.io>
Adds three new objects to the cozy_proxy nftables table:
- filtered_svcs (set of svc IPs subject to port filtering)
- allowed_ports (concat set: ipv4_addr . inet_proto . inet_service)
- port_filter chain at priority -350, runs before early_snat (-300)

A packet whose ip daddr is in filtered_svcs but whose
(daddr, l4proto, dport) tuple is not in allowed_ports is dropped before
any DNAT rewrite. Supports TCP and UDP. SNAT egress is unchanged.

The 12-byte concat key is laid out across NFT_REG32_00..02 (3 contiguous
4-byte slots): ipv4 daddr, then meta l4proto, then th dport. A single
inverted Lookup against allowed_ports drives the drop verdict.

Implements EnsurePortFilter / DeletePortFilter / CleanupPortFilters for
NFTProxyProcessor. Adds a unit test for the concat key packing.

Signed-off-by: mattia-eleuteri <mattia@hidora.io>
The port_filter chain previously dropped any packet to a filtered svc IP
whose dport was not in allowed_ports. This broke egress from VMs in
PortList mode: their return packets arrive with daddr=LB IP and dport=
ephemeral source port, which the drop rule would match.

Add an idiomatic stateful firewall rule before the drop: any packet that
belongs to an established or related connection bypasses the filter.
Conntrack state is populated by the nf_conntrack subsystem independently
of the chain priority, so this works correctly at priority -350.

Validated on hikube-lab: ingress filter still drops 80/443/8080/9999 on
a VM with externalPorts: [22], while curl https://api.ipify.org from
the same VM now returns its own LB IP (egress preserved).

Signed-off-by: mattia-eleuteri <mattia@hidora.io>
…ist mode

The previous design placed the port_filter chain at priority -350 (before
nftables raw at -300) and matched on the LB IP via filtered_svcs. This
broke egress: at priority -350 the conntrack subsystem (priority -200)
has not yet labelled packets, so the "ct state established,related accept"
bypass never matched, and reply packets to outgoing flows were dropped.

Move port_filter to priority filter (0), which runs after conntrack and
after early_snat (-300). At that point daddr has been rewritten to the
pod IP, so the filter matches on filtered_pods (pod IPs) instead of
filtered_svcs. Ingress filtering semantics are preserved; egress
preservation now works because ct state is correctly populated.

Update the ProxyProcessor interface to pass podIP alongside svcIP for
EnsurePortFilter and DeletePortFilter, and switch CleanupPortFilters to
a map[string]PortFilterEntry keyed by svcIP.

Signed-off-by: mattia-eleuteri <mattia@hidora.io>
…mangle)

The previous design did both saddr and daddr rewrite in a single chain
at priority raw (-300), before conntrack at -200. This made conntrack
unable to match reply packets of egress flows: the daddr rewrite happened
before conntrack saw the packet, so the recorded reply tuple did not
match what conntrack actually observed for return packets, marking them
NEW instead of ESTABLISHED. With port_filter at priority filter, return
packets of egress flows from filtered VMs were dropped.

Split the rewrite into two chains:
- egress_snat at raw (-300): saddr rewrite via pod_svc map. Runs before
  conntrack so the recorded tuple has saddr=LB_IP.
- ingress_dnat at mangle (-150): daddr rewrite via svc_pod map. Runs
  after conntrack so the conntrack lookup observes the LB_IP destination
  and matches reply packets correctly. Also runs before port_filter at
  filter (0), so port_filter sees the post-DNAT pod IP for filtering.

Behavior is unchanged for unfiltered services (no port_filter), and
egress IP preservation is preserved in both modes.

Signed-off-by: mattia-eleuteri <mattia@hidora.io>
Signed-off-by: mattia-eleuteri <mattia@hidora.io>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Warning

Rate limit exceeded

@mattia-eleuteri has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 48 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a93ac64a-623c-41b5-b5c9-037e0550b7a8

📥 Commits

Reviewing files that changed from the base of the PR and between 460c757 and 29ea0d7.

📒 Files selected for processing (2)
  • pkg/proxy/interface.go
  • pkg/proxy/nft.go
📝 Walkthrough

Walkthrough

The changes implement port-filtering support for the wholeIP annotation across the networking stack. Documentation is updated to reflect value-driven annotation behavior. The service controller manages port-filter rules alongside NAT mappings based on passthrough mode. The proxy interface gains port-filter lifecycle methods, implemented in nftables with a new port_filter chain and supporting data structures, while dummy proxy provides stub implementations.

Changes

Cohort / File(s) Summary
Documentation
README.md
Updated to document networking.cozystack.io/wholeIP as a value-driven annotation with modes ("true", "false", absent) and corresponding datapath behavior via pod_svc SNAT, svc_pod DNAT, and port_filter rules.
Service Controller
pkg/controllers/services_controller.go, pkg/controllers/services_controller_test.go
Controller now manages port-filter state alongside NAT rules. On service/endpoint add-update, it calls EnsureRules then ensures/deletes port filters based on wholeIPPassthrough result. Annotation is value-tolerant (any non-"false" value enables passthrough). Tests verify annotation parsing and passthrough behavior.
Proxy Interface
pkg/proxy/interface.go
Extended ProxyProcessor interface with three port-filter methods: EnsurePortFilter, DeletePortFilter, CleanupPortFilters. Added PortFilterEntry type for reconciliation snapshots.
Proxy Implementations
pkg/proxy/dummy.go, pkg/proxy/nft.go, pkg/proxy/nft_test.go
Dummy proxy adds stub port-filter methods. NFT proxy implements port filtering via new filtered_pods and allowed_ports sets plus port_filter chain; splits prior early_snat into egress_snat/ingress_dnat chains; adds methods for port-filter lifecycle. Tests validate port key concatenation logic.

Sequence Diagram

sequenceDiagram
    actor K8s as Kubernetes
    participant SC as Service<br/>Controller
    participant PP as Proxy<br/>Processor
    participant NFT as nftables

    K8s->>SC: Service/Endpoint Add or Update
    activate SC
    SC->>PP: EnsureRules(svcIP, podIP)
    activate PP
    PP->>NFT: Apply NAT rules (egress_snat, ingress_dnat)
    deactivate PP
    
    alt wholeIPPassthrough == true
        SC->>PP: DeletePortFilter(svcIP, podIP)
        activate PP
        PP->>NFT: Remove from filtered_pods, allowed_ports
        deactivate PP
    else wholeIPPassthrough == false
        SC->>PP: EnsurePortFilter(svcIP, podIP, ports)
        activate PP
        PP->>NFT: Add to filtered_pods, populate allowed_ports
        deactivate PP
    end
    deactivate SC

    K8s->>SC: Service Deletion
    activate SC
    SC->>PP: DeletePortFilter(svcIP, podIP)
    activate PP
    PP->>NFT: Clean up filter state
    deactivate PP
    SC->>PP: DeleteRules(svcIP, podIP)
    deactivate SC
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Port filters bloom where wholeIP doth play,
Value-driven annotations light the way,
nftables dance with SNAT and DNAT,
Conntrack respects the filtering mat,
Service egress finds its proper path!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately summarizes the main change: adding per-service ingress port filtering based on the wholeIP annotation, which is the primary feature delivered across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces per-port filtering for services managed by cozy-proxy via the networking.cozystack.io/wholeIP annotation. When set to "false", only traffic to ports defined in the service spec is allowed, while other traffic is dropped. The implementation includes new nftables chains and sets, updates to the service controller to manage these filters, and corresponding documentation and tests. Feedback highlights the need to update endpoint handlers to maintain consistency, as well as opportunities to optimize nftables operations by batching set updates and reducing redundant flushes during cleanup.

Comment on lines 231 to 248
if err == nil && ep != nil && hasValidEndpointIP(ep) && hasValidServiceIP(svc) {
se.Endpoint = ep
c.Services.Set(svc.Namespace, svc.Name, se)
svcIP := svc.Status.LoadBalancer.Ingress[0].IP
podIP := ep.Subsets[0].Addresses[0].IP
// Ensure NAT mapping rules are set.
c.Proxy.EnsureRules(svc.Status.LoadBalancer.Ingress[0].IP, ep.Subsets[0].Addresses[0].IP)
c.Proxy.EnsureRules(svcIP, podIP)
// Apply (or remove) port filtering depending on annotation value.
if wholeIPPassthrough(svc) {
if err := c.Proxy.DeletePortFilter(svcIP, podIP); err != nil {
log.Error(err, "failed to delete port filter", "svcIP", svcIP, "podIP", podIP)
}
} else {
if err := c.Proxy.EnsurePortFilter(svcIP, podIP, svc.Spec.Ports); err != nil {
log.Error(err, "failed to ensure port filter", "svcIP", svcIP, "podIP", podIP)
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The port filtering logic introduced in this PR is currently only applied during Service events. However, changes to Endpoints also affect the NAT and filtering state. The addEndpointFunc, deleteEndpointFunc, and updateEndpointFunc handlers (which are not modified in this diff but are part of the controller) should be updated to manage port filters similarly to how they manage NAT rules. Without these updates, the firewall state may become inconsistent when pods are added or removed until a Service update event is triggered.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 0b1a052. Mirrored the EnsureRules / DeleteRules calls in addEndpointFunc, updateEndpointFunc, and deleteEndpointFunc with EnsurePortFilter / DeletePortFilter so the firewall tracks endpoint changes in real time, not only on Service events.

Comment thread pkg/proxy/nft.go
Comment on lines +598 to +615
for _, sp := range ports {
var protoByte byte
switch sp.Protocol {
case corev1.ProtocolTCP, "":
protoByte = unix.IPPROTO_TCP
case corev1.ProtocolUDP:
protoByte = unix.IPPROTO_UDP
default:
log.Info("Skipping unsupported protocol for port filter",
"podIP", podIP, "protocol", sp.Protocol)
continue
}
key := concatPortKey(parsedPodIP, protoByte, uint16(sp.Port))
if err := p.conn.SetAddElements(p.allowedPorts, []nftables.SetElement{{Key: key}}); err != nil {
return fmt.Errorf("failed to add port tuple to allowed_ports (pod=%s proto=%v port=%d): %v",
podIP, sp.Protocol, sp.Port, err)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling SetAddElements inside a loop for each port is inefficient as it creates multiple netlink messages. Since SetAddElements accepts a slice of elements, it is better to collect all port keys first and perform a single call to the library.

	var elements []nftables.SetElement
	for _, sp := range ports {
		var protoByte byte
		switch sp.Protocol {
		case corev1.ProtocolTCP, "":
			protoByte = unix.IPPROTO_TCP
		case corev1.ProtocolUDP:
			protoByte = unix.IPPROTO_UDP
		default:
			log.Info("Skipping unsupported protocol for port filter",
				"podIP", podIP, "protocol", sp.Protocol)
			continue
		}
		elements = append(elements, nftables.SetElement{Key: concatPortKey(parsedPodIP, protoByte, uint16(sp.Port))})
	}
	if len(elements) > 0 {
		if err := p.conn.SetAddElements(p.allowedPorts, elements); err != nil {
			return fmt.Errorf("failed to add port tuples to allowed_ports for pod %s: %v", podIP, err)
		}
	}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 460c757. Accumulating elements into a slice and issuing a single SetAddElements call.

Comment thread pkg/proxy/nft.go
Comment on lines +656 to +684
func (p *NFTProxyProcessor) CleanupPortFilters(keep map[string]PortFilterEntry) error {
log.Info("Starting CleanupPortFilters", "keepCount", len(keep))

desired := make(map[string]bool, len(keep))
for _, entry := range keep {
desired[entry.PodIP] = true
}

current, err := p.conn.GetSetElements(p.filteredPods)
if err != nil {
return fmt.Errorf("failed to list filtered_pods: %v", err)
}
for _, el := range current {
ipStr := net.IP(el.Key).String()
if !desired[ipStr] {
log.Info("Removing stale filtered_pod", "podIP", ipStr)
// We don't have the original svcIP in current state; pass empty.
if err := p.DeletePortFilter("", ipStr); err != nil {
return fmt.Errorf("cleanup delete pod %s: %v", ipStr, err)
}
}
}
for svcIP, entry := range keep {
if err := p.EnsurePortFilter(svcIP, entry.PodIP, entry.Ports); err != nil {
return fmt.Errorf("cleanup ensure %s: %v", svcIP, err)
}
}
return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The CleanupPortFilters method is inefficient because it performs multiple ruleset flushes and redundant netlink queries. It calls DeletePortFilter and EnsurePortFilter within loops, each of which triggers a Flush(). Additionally, removeAllowedPortsForPod (called by both) retrieves all elements of the allowed_ports set every time it is invoked. For a large number of services, this results in $O(N^2)$ complexity relative to the number of services and ports.

Consider optimizing this by:

  1. Fetching all elements of filtered_pods and allowed_ports once at the start of the cleanup.
  2. Calculating the set of elements to be added or removed in memory.
  3. Batching all SetAddElements and SetDeleteElements calls.
  4. Performing a single Flush() at the end of the reconciliation process.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 460c757. Rewrote CleanupPortFilters as a single-pass diff: fetch each set's elements once, compute additions and removals in memory (using a hex string key for the concat-set tuples), batch SetAddElements / SetDeleteElements per set, and Flush exactly once at the end. The previous loop ran GetSetElements + Flush for every entry which was O(N^2) in services × ports.

Port-filter state was previously only updated on Service events. When
an Endpoint changed (pod added / removed / IP changed) without a
concurrent Service event, the firewall could remain stale until the
next Service update. Mirror the EnsureRules / DeleteRules invocations
in addEndpointFunc, updateEndpointFunc, and deleteEndpointFunc so that
the port-filter set tracks endpoint changes in real time.

Addresses gemini-code-assist review feedback on PR cozystack#11.

Signed-off-by: mattia-eleuteri <mattia@hidora.io>
EnsurePortFilter previously called SetAddElements once per port;
accumulate elements and issue a single call. CleanupPortFilters
previously delegated to DeletePortFilter / EnsurePortFilter in a loop,
each performing GetSetElements and Flush — O(N^2) in number of
services * ports. Replace with a single-pass diff: fetch current state
once per set, compute additions and removals in memory, batch
SetAddElements / SetDeleteElements per set, and Flush exactly once.

Addresses gemini-code-assist review feedback on PR cozystack#11.

Signed-off-by: mattia-eleuteri <mattia@hidora.io>
@mattia-eleuteri mattia-eleuteri marked this pull request as ready for review April 27, 2026 16:19
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
pkg/controllers/services_controller_test.go (1)

14-53: Consider pinning down edge-case annotation values.

The current implementation treats "" (empty value), "FALSE" (uppercase), and any non-"false" literal as passthrough-enabled and managed. None are exercised here. Locking these behaviors in tests prevents accidental regressions if the predicate is ever rewritten (e.g., to a strings.EqualFold comparison).

♻️ Proposed additional cases
   cases := []struct {
       name   string
       svc    *v1.Service
       expect bool
   }{
       {"true value", svcWith(map[string]string{"networking.cozystack.io/wholeIP": "true"}), true},
       {"false value", svcWith(map[string]string{"networking.cozystack.io/wholeIP": "false"}), false},
       {"absent annotation defaults to passthrough", svcWith(map[string]string{}), true},
       {"nil annotations defaults to passthrough", &v1.Service{}, true},
+      {"empty value defaults to passthrough", svcWith(map[string]string{"networking.cozystack.io/wholeIP": ""}), true},
+      {"uppercase FALSE defaults to passthrough (case-sensitive)", svcWith(map[string]string{"networking.cozystack.io/wholeIP": "FALSE"}), true},
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/services_controller_test.go` around lines 14 - 53, Add
edge-case tests for annotation parsing in TestHasWholeIPAnnotation and
TestWholeIPPassthrough: exercise an empty-string value (""), an uppercase
"FALSE" (e.g., "FALSE"), and a non-"false" arbitrary literal (e.g., "nope") to
lock current behavior of hasWholeIPAnnotation and wholeIPPassthrough; update the
test case tables to include entries asserting hasWholeIPAnnotation treats "" and
"FALSE" and "nope" the same as the existing logic, and assert wholeIPPassthrough
returns the expected booleans for these values so future changes to the
predicate implementations are caught.
pkg/proxy/nft.go (1)

599-612: Optional: defend against out-of-range port values.

sp.Port is int32; the cast to uint16 truncates silently. Kubernetes API validation rejects ports outside 1–65535 on admission, so this is safe in practice — but a guard avoids a subtle silent corruption if the validator is ever bypassed (e.g., when the controller is fed objects from a non-API source in tests).

🛡️ Proposed guard
 	for _, sp := range ports {
+		if sp.Port < 1 || sp.Port > 65535 {
+			log.Info("Skipping out-of-range port for port filter",
+				"podIP", podIP, "port", sp.Port)
+			continue
+		}
 		var protoByte byte
 		switch sp.Protocol {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/proxy/nft.go` around lines 599 - 612, Loop currently casts sp.Port
(int32) to uint16 which can truncate invalid values; add a guard in the ports
loop (before using concatPortKey) that verifies sp.Port is within 1..65535, skip
and log an informative message if out of range, and only then convert to uint16
for concatPortKey and nftables.SetElement creation; reference the ports
iteration, sp.Port, protoByte, parsedPodIP, concatPortKey, and
nftables.SetElement when adding the check.
pkg/controllers/services_controller.go (1)

234-247: Inconsistent error handling between NAT and port-filter calls.

The new port-filter operations log their errors (lines 240–246), but the adjacent c.Proxy.EnsureRules(svcIP, podIP) call drops its return value silently. golangci-lint's errcheck flags this on lines 237, 273, 295, 311, 350, 388. While this is a pre-existing pattern, the diff already touches these blocks, so it's a low-cost moment to make the handling uniform — a failed EnsureRules followed by a "successful" EnsurePortFilter log line is misleading during incident triage.

♻️ Proposed fix (apply at all six errcheck sites)
-		c.Proxy.EnsureRules(svcIP, podIP)
+		if err := c.Proxy.EnsureRules(svcIP, podIP); err != nil {
+			log.Error(err, "failed to ensure NAT rules", "svcIP", svcIP, "podIP", podIP)
+		}
 		// Apply (or remove) port filtering depending on annotation value.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/services_controller.go` around lines 234 - 247, The call to
c.Proxy.EnsureRules(svcIP, podIP) currently ignores its error; capture its
returned error and handle it the same way as the port-filter calls by checking
if err != nil and calling log.Error(err, "failed to ensure NAT mapping rules",
"svcIP", svcIP, "podIP", podIP); apply this same change for all occurrences of
EnsureRules flagged by errcheck (e.g., the other blocks that mirror this
pattern) so error reporting is consistent with DeletePortFilter/EnsurePortFilter
and wholeIPPassthrough logic.
pkg/proxy/interface.go (1)

1-9: Optional: avoid leaking corev1.ServicePort into the proxy abstraction.

The proxy interface previously was IP-only and Kubernetes-agnostic. Importing corev1 here couples the NAT abstraction to the k8s API just to convey (protocol, port) tuples. A small proxy-local type (e.g., type PortSpec struct { Protocol string; Port uint16 }) would keep the layering clean and let the controller do the conversion. Defer if intentional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/proxy/interface.go` around lines 1 - 9, Replace the Kubernetes API type
leak in the proxy abstraction by removing the corev1.ServicePort usage and
introducing a proxy-local PortSpec type (e.g., type PortSpec struct { Protocol
string; Port uint16 }) in the same file; update the ProxyProcessor interface
methods that currently carry service port info (EnsureRules, DeleteRules and/or
CleanupRules) to accept []PortSpec (or the minimal collection you need) instead
of corev1.ServicePort, remove the corev1 import, and make the
controller/convertor layer translate corev1.ServicePort -> PortSpec before
calling ProxyProcessor methods.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/proxy/interface.go`:
- Around line 11-21: The docstring for EnsurePortFilter claims an empty Ports
disables filtering, but the EnsurePortFilter implementation adds the pod to
filtered_pods with an empty allowed_ports which drops all traffic; fix this by
updating the EnsurePortFilter implementation (in nft.go) to short-circuit when
len(Ports)==0: call DeletePortFilter(SvcIP, PodIP) and return its result (or nil
after successful removal) instead of installing an empty rule set; reference the
EnsurePortFilter and DeletePortFilter symbols so you can locate and change the
early-exit behavior.

In `@pkg/proxy/nft.go`:
- Around line 137-149: The comments referencing early_snat are stale after the
chain split—update the comment block around the port_filter chain creation (the
code that sets p.portFilterCh via p.conn.AddChain for the "port_filter"
nftables.Chain) to say that the destination address (daddr) is rewritten by
ingress_dnat at mangle (-150) rather than early_snat, and adjust the priority
note from (-300) to (-150); also do a project-wide sweep to replace other
early_snat mentions (e.g. filtered_pods comment, the drop-rule comment, and the
EnsurePortFilter docstring in pkg/proxy/interface.go) with ingress_dnat and fix
any priority numeric references accordingly.

---

Nitpick comments:
In `@pkg/controllers/services_controller_test.go`:
- Around line 14-53: Add edge-case tests for annotation parsing in
TestHasWholeIPAnnotation and TestWholeIPPassthrough: exercise an empty-string
value (""), an uppercase "FALSE" (e.g., "FALSE"), and a non-"false" arbitrary
literal (e.g., "nope") to lock current behavior of hasWholeIPAnnotation and
wholeIPPassthrough; update the test case tables to include entries asserting
hasWholeIPAnnotation treats "" and "FALSE" and "nope" the same as the existing
logic, and assert wholeIPPassthrough returns the expected booleans for these
values so future changes to the predicate implementations are caught.

In `@pkg/controllers/services_controller.go`:
- Around line 234-247: The call to c.Proxy.EnsureRules(svcIP, podIP) currently
ignores its error; capture its returned error and handle it the same way as the
port-filter calls by checking if err != nil and calling log.Error(err, "failed
to ensure NAT mapping rules", "svcIP", svcIP, "podIP", podIP); apply this same
change for all occurrences of EnsureRules flagged by errcheck (e.g., the other
blocks that mirror this pattern) so error reporting is consistent with
DeletePortFilter/EnsurePortFilter and wholeIPPassthrough logic.

In `@pkg/proxy/interface.go`:
- Around line 1-9: Replace the Kubernetes API type leak in the proxy abstraction
by removing the corev1.ServicePort usage and introducing a proxy-local PortSpec
type (e.g., type PortSpec struct { Protocol string; Port uint16 }) in the same
file; update the ProxyProcessor interface methods that currently carry service
port info (EnsureRules, DeleteRules and/or CleanupRules) to accept []PortSpec
(or the minimal collection you need) instead of corev1.ServicePort, remove the
corev1 import, and make the controller/convertor layer translate
corev1.ServicePort -> PortSpec before calling ProxyProcessor methods.

In `@pkg/proxy/nft.go`:
- Around line 599-612: Loop currently casts sp.Port (int32) to uint16 which can
truncate invalid values; add a guard in the ports loop (before using
concatPortKey) that verifies sp.Port is within 1..65535, skip and log an
informative message if out of range, and only then convert to uint16 for
concatPortKey and nftables.SetElement creation; reference the ports iteration,
sp.Port, protoByte, parsedPodIP, concatPortKey, and nftables.SetElement when
adding the check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f81abce3-519b-495c-b92e-19f01367cda9

📥 Commits

Reviewing files that changed from the base of the PR and between 5655195 and 460c757.

📒 Files selected for processing (7)
  • README.md
  • pkg/controllers/services_controller.go
  • pkg/controllers/services_controller_test.go
  • pkg/proxy/dummy.go
  • pkg/proxy/interface.go
  • pkg/proxy/nft.go
  • pkg/proxy/nft_test.go

Comment thread pkg/proxy/interface.go
Comment thread pkg/proxy/nft.go Outdated
The interface docstring documents EnsurePortFilter with an empty Ports
slice as equivalent to DeletePortFilter (i.e. disable filtering for
this pod). The implementation, however, added the pod to filtered_pods
without populating allowed_ports — which made every ingress packet to
the pod IP match the drop rule, the opposite of "no filter". Make the
implementation match the contract: if Ports is empty, delegate to
DeletePortFilter and return.

Addresses coderabbitai review feedback on PR cozystack#11.

Signed-off-by: mattia-eleuteri <mattia@hidora.io>
After the early_snat chain was split into egress_snat (raw, saddr
rewrite) and ingress_dnat (mangle, daddr rewrite), several comments
still attributed the post-DNAT pod IP to early_snat. Rename the
references in comments and the EnsurePortFilter docstring to
ingress_dnat with the correct priority (-150). The egress_snat (saddr,
priority raw) references are left unchanged where accurate.

Addresses coderabbitai review feedback on PR cozystack#11.

Signed-off-by: mattia-eleuteri <mattia@hidora.io>
Copy link
Copy Markdown
Member

@kvaps kvaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Verified the datapath split (egress_snat raw → conntrack → ingress_dnat mangle → port_filter filter), the concat-set construction, and that the empty-Spec.Ports short-circuit lands as DeletePortFilter rather than a black hole. Tests for the predicates and concat key layout look good.

The one gap I flagged in review — ICMP being silently dropped in port-filter mode (breaks ping, PMTU discovery, ICMP unreachable) — is addressed by the stacked follow-up at mattia-eleuteri#1, which I'll merge right after this lands.

@kvaps kvaps merged commit 4a8d111 into cozystack:main Apr 27, 2026
2 checks passed
mattia-eleuteri added a commit to mattia-eleuteri/cozystack that referenced this pull request Apr 28, 2026
- Add `externalAllowICMP` value (default true) propagated as
  `networking.cozystack.io/allowICMP` annotation on the rendered Service
  when `externalMethod: PortList`. The cozy-proxy companion (released as
  part of cozystack/cozy-proxy#11 + cozystack#12) drops ICMP by default in
  port-filter mode, which breaks ping and PMTU discovery; defaulting the
  chart to "true" preserves user expectations while still allowing
  operators to opt out by setting `externalAllowICMP: false`.

- Remove the v1.3.1.md changelog entry. Project convention is to add
  changelogs in a dedicated "docs: add changelog for vX.Y.Z" commit at
  release time, not as part of feature/fix PRs.

Signed-off-by: mattia-eleuteri <mattia@hidora.io>
kvaps added a commit to cozystack/cozystack that referenced this pull request Apr 28, 2026
## What this PR does

The `vm-instance` chart now drives the cozy-proxy `wholeIP` and
`allowICMP` annotations explicitly so that `externalMethod: PortList`
actually filters ingress traffic to declared ports while keeping
ping/PMTU functional.

- Render `networking.cozystack.io/wholeIP: "false"` when
`externalMethod: PortList` (was always `"true"`, which silently disabled
the PortList semantics in cozy-proxy).
- Add `externalAllowICMP` value (default `true`) propagated as
`networking.cozystack.io/allowICMP` when `externalMethod: PortList`.
Without this, cozy-proxy drops ICMP in port-filter mode (ping/PMTU
broken). Operators can set `externalAllowICMP: false` to opt out.

The changelog entry is intentionally **not** part of this PR — it will
be added in a dedicated `docs: add changelog for vX.Y.Z` commit at
release time, per project convention.

## Why

`externalMethod: PortList` is documented as filtering ingress traffic to
declared ports but has been non-functional on Cozystack v1.3.0 —
verified empirically on a 3-node Talos lab. Root cause was twofold:
chart always set `wholeIP: "true"`, and cozy-proxy v0.2.0 had no
port-aware logic. The cozy-proxy side was fixed in
cozystack/cozy-proxy#11 (merged) and cozystack/cozy-proxy#12 (allowICMP
follow-up); this PR completes the user-visible fix on the chart side.

## Companion PRs

- cozystack/cozy-proxy#11 (merged) — per-service ingress port filtering
- cozystack/cozy-proxy#12 (merged) — `allowICMP` annotation for
port-filter mode

## Test plan

- [x] Built cozy-proxy with the companion fix locally, deployed on a
3-node Talos lab (Cozystack v1.3.0)
- [x] `wholeIP: "false"` Service with `spec.ports: [22]`: only port 22
reachable from outside; ports 80/443/8080/9999 filtered
- [x] WholeIP-annotated Service unchanged: all listening ports reachable
- [x] Egress IP preservation works in both modes (TCP curl + UDP DNS)
- [x] `nft list table ip cozy_proxy` confirms expected ruleset
- [x] `helm template` renders the expected annotation matrix: `PortList`
default → `wholeIP=false, allowICMP=true`; `PortList` opt-out →
`allowICMP=false`; `WholeIP` → only `wholeIP=true`
- [x] `make unit-tests` passes locally
- [ ] CI unit tests
- [ ] CI E2E

## Backport

Suggesting `backport-v1.3` once merged.

## Release note

```release-note
[vm-instance] Make `externalMethod: PortList` actually filter ingress traffic to ports listed in `externalPorts`. New `externalAllowICMP` knob (default true) propagates the cozy-proxy `allowICMP` annotation to keep ping/PMTU functional in port-filter mode. Combined with cozy-proxy v0.3.0+, only listed ports plus ICMP are reachable from the VM's LoadBalancer IP.
```


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added `externalAllowICMP` configuration option to control ICMP traffic
acceptance for VM external access in PortList mode (enabled by default).

* **Documentation**
* Updated parameter documentation to include the new ICMP traffic
control setting.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants