feat: per-service ingress port filtering#11
Conversation
…face methods Adds three new methods to the ProxyProcessor interface to support per-service ingress port filtering. The DummyProxyProcessor implementation logs invocations without performing any datapath changes — production logic lands in the NFTProxyProcessor in a follow-up commit. Signed-off-by: mattia-eleuteri <mattia@hidora.io>
The wholeIP annotation now triggers cozy-proxy management for any value (including "false"). The new wholeIPPassthrough helper interprets the value: "true" or absent (backward-compat) means forward all ports; "false" enables per-service port filtering set up via EnsurePortFilter using Service.spec.ports. cleanupRemovedServices is extended to reconcile port filters at restart so the live datapath state matches the cluster snapshot. Signed-off-by: mattia-eleuteri <mattia@hidora.io>
Adds three new objects to the cozy_proxy nftables table: - filtered_svcs (set of svc IPs subject to port filtering) - allowed_ports (concat set: ipv4_addr . inet_proto . inet_service) - port_filter chain at priority -350, runs before early_snat (-300) A packet whose ip daddr is in filtered_svcs but whose (daddr, l4proto, dport) tuple is not in allowed_ports is dropped before any DNAT rewrite. Supports TCP and UDP. SNAT egress is unchanged. The 12-byte concat key is laid out across NFT_REG32_00..02 (3 contiguous 4-byte slots): ipv4 daddr, then meta l4proto, then th dport. A single inverted Lookup against allowed_ports drives the drop verdict. Implements EnsurePortFilter / DeletePortFilter / CleanupPortFilters for NFTProxyProcessor. Adds a unit test for the concat key packing. Signed-off-by: mattia-eleuteri <mattia@hidora.io>
The port_filter chain previously dropped any packet to a filtered svc IP whose dport was not in allowed_ports. This broke egress from VMs in PortList mode: their return packets arrive with daddr=LB IP and dport= ephemeral source port, which the drop rule would match. Add an idiomatic stateful firewall rule before the drop: any packet that belongs to an established or related connection bypasses the filter. Conntrack state is populated by the nf_conntrack subsystem independently of the chain priority, so this works correctly at priority -350. Validated on hikube-lab: ingress filter still drops 80/443/8080/9999 on a VM with externalPorts: [22], while curl https://api.ipify.org from the same VM now returns its own LB IP (egress preserved). Signed-off-by: mattia-eleuteri <mattia@hidora.io>
…ist mode The previous design placed the port_filter chain at priority -350 (before nftables raw at -300) and matched on the LB IP via filtered_svcs. This broke egress: at priority -350 the conntrack subsystem (priority -200) has not yet labelled packets, so the "ct state established,related accept" bypass never matched, and reply packets to outgoing flows were dropped. Move port_filter to priority filter (0), which runs after conntrack and after early_snat (-300). At that point daddr has been rewritten to the pod IP, so the filter matches on filtered_pods (pod IPs) instead of filtered_svcs. Ingress filtering semantics are preserved; egress preservation now works because ct state is correctly populated. Update the ProxyProcessor interface to pass podIP alongside svcIP for EnsurePortFilter and DeletePortFilter, and switch CleanupPortFilters to a map[string]PortFilterEntry keyed by svcIP. Signed-off-by: mattia-eleuteri <mattia@hidora.io>
…mangle) The previous design did both saddr and daddr rewrite in a single chain at priority raw (-300), before conntrack at -200. This made conntrack unable to match reply packets of egress flows: the daddr rewrite happened before conntrack saw the packet, so the recorded reply tuple did not match what conntrack actually observed for return packets, marking them NEW instead of ESTABLISHED. With port_filter at priority filter, return packets of egress flows from filtered VMs were dropped. Split the rewrite into two chains: - egress_snat at raw (-300): saddr rewrite via pod_svc map. Runs before conntrack so the recorded tuple has saddr=LB_IP. - ingress_dnat at mangle (-150): daddr rewrite via svc_pod map. Runs after conntrack so the conntrack lookup observes the LB_IP destination and matches reply packets correctly. Also runs before port_filter at filter (0), so port_filter sees the post-DNAT pod IP for filtering. Behavior is unchanged for unfiltered services (no port_filter), and egress IP preservation is preserved in both modes. Signed-off-by: mattia-eleuteri <mattia@hidora.io>
Signed-off-by: mattia-eleuteri <mattia@hidora.io>
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe changes implement port-filtering support for the Changes
Sequence DiagramsequenceDiagram
actor K8s as Kubernetes
participant SC as Service<br/>Controller
participant PP as Proxy<br/>Processor
participant NFT as nftables
K8s->>SC: Service/Endpoint Add or Update
activate SC
SC->>PP: EnsureRules(svcIP, podIP)
activate PP
PP->>NFT: Apply NAT rules (egress_snat, ingress_dnat)
deactivate PP
alt wholeIPPassthrough == true
SC->>PP: DeletePortFilter(svcIP, podIP)
activate PP
PP->>NFT: Remove from filtered_pods, allowed_ports
deactivate PP
else wholeIPPassthrough == false
SC->>PP: EnsurePortFilter(svcIP, podIP, ports)
activate PP
PP->>NFT: Add to filtered_pods, populate allowed_ports
deactivate PP
end
deactivate SC
K8s->>SC: Service Deletion
activate SC
SC->>PP: DeletePortFilter(svcIP, podIP)
activate PP
PP->>NFT: Clean up filter state
deactivate PP
SC->>PP: DeleteRules(svcIP, podIP)
deactivate SC
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces per-port filtering for services managed by cozy-proxy via the networking.cozystack.io/wholeIP annotation. When set to "false", only traffic to ports defined in the service spec is allowed, while other traffic is dropped. The implementation includes new nftables chains and sets, updates to the service controller to manage these filters, and corresponding documentation and tests. Feedback highlights the need to update endpoint handlers to maintain consistency, as well as opportunities to optimize nftables operations by batching set updates and reducing redundant flushes during cleanup.
| if err == nil && ep != nil && hasValidEndpointIP(ep) && hasValidServiceIP(svc) { | ||
| se.Endpoint = ep | ||
| c.Services.Set(svc.Namespace, svc.Name, se) | ||
| svcIP := svc.Status.LoadBalancer.Ingress[0].IP | ||
| podIP := ep.Subsets[0].Addresses[0].IP | ||
| // Ensure NAT mapping rules are set. | ||
| c.Proxy.EnsureRules(svc.Status.LoadBalancer.Ingress[0].IP, ep.Subsets[0].Addresses[0].IP) | ||
| c.Proxy.EnsureRules(svcIP, podIP) | ||
| // Apply (or remove) port filtering depending on annotation value. | ||
| if wholeIPPassthrough(svc) { | ||
| if err := c.Proxy.DeletePortFilter(svcIP, podIP); err != nil { | ||
| log.Error(err, "failed to delete port filter", "svcIP", svcIP, "podIP", podIP) | ||
| } | ||
| } else { | ||
| if err := c.Proxy.EnsurePortFilter(svcIP, podIP, svc.Spec.Ports); err != nil { | ||
| log.Error(err, "failed to ensure port filter", "svcIP", svcIP, "podIP", podIP) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The port filtering logic introduced in this PR is currently only applied during Service events. However, changes to Endpoints also affect the NAT and filtering state. The addEndpointFunc, deleteEndpointFunc, and updateEndpointFunc handlers (which are not modified in this diff but are part of the controller) should be updated to manage port filters similarly to how they manage NAT rules. Without these updates, the firewall state may become inconsistent when pods are added or removed until a Service update event is triggered.
There was a problem hiding this comment.
Good catch — fixed in 0b1a052. Mirrored the EnsureRules / DeleteRules calls in addEndpointFunc, updateEndpointFunc, and deleteEndpointFunc with EnsurePortFilter / DeletePortFilter so the firewall tracks endpoint changes in real time, not only on Service events.
| for _, sp := range ports { | ||
| var protoByte byte | ||
| switch sp.Protocol { | ||
| case corev1.ProtocolTCP, "": | ||
| protoByte = unix.IPPROTO_TCP | ||
| case corev1.ProtocolUDP: | ||
| protoByte = unix.IPPROTO_UDP | ||
| default: | ||
| log.Info("Skipping unsupported protocol for port filter", | ||
| "podIP", podIP, "protocol", sp.Protocol) | ||
| continue | ||
| } | ||
| key := concatPortKey(parsedPodIP, protoByte, uint16(sp.Port)) | ||
| if err := p.conn.SetAddElements(p.allowedPorts, []nftables.SetElement{{Key: key}}); err != nil { | ||
| return fmt.Errorf("failed to add port tuple to allowed_ports (pod=%s proto=%v port=%d): %v", | ||
| podIP, sp.Protocol, sp.Port, err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Calling SetAddElements inside a loop for each port is inefficient as it creates multiple netlink messages. Since SetAddElements accepts a slice of elements, it is better to collect all port keys first and perform a single call to the library.
var elements []nftables.SetElement
for _, sp := range ports {
var protoByte byte
switch sp.Protocol {
case corev1.ProtocolTCP, "":
protoByte = unix.IPPROTO_TCP
case corev1.ProtocolUDP:
protoByte = unix.IPPROTO_UDP
default:
log.Info("Skipping unsupported protocol for port filter",
"podIP", podIP, "protocol", sp.Protocol)
continue
}
elements = append(elements, nftables.SetElement{Key: concatPortKey(parsedPodIP, protoByte, uint16(sp.Port))})
}
if len(elements) > 0 {
if err := p.conn.SetAddElements(p.allowedPorts, elements); err != nil {
return fmt.Errorf("failed to add port tuples to allowed_ports for pod %s: %v", podIP, err)
}
}There was a problem hiding this comment.
Fixed in 460c757. Accumulating elements into a slice and issuing a single SetAddElements call.
| func (p *NFTProxyProcessor) CleanupPortFilters(keep map[string]PortFilterEntry) error { | ||
| log.Info("Starting CleanupPortFilters", "keepCount", len(keep)) | ||
|
|
||
| desired := make(map[string]bool, len(keep)) | ||
| for _, entry := range keep { | ||
| desired[entry.PodIP] = true | ||
| } | ||
|
|
||
| current, err := p.conn.GetSetElements(p.filteredPods) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to list filtered_pods: %v", err) | ||
| } | ||
| for _, el := range current { | ||
| ipStr := net.IP(el.Key).String() | ||
| if !desired[ipStr] { | ||
| log.Info("Removing stale filtered_pod", "podIP", ipStr) | ||
| // We don't have the original svcIP in current state; pass empty. | ||
| if err := p.DeletePortFilter("", ipStr); err != nil { | ||
| return fmt.Errorf("cleanup delete pod %s: %v", ipStr, err) | ||
| } | ||
| } | ||
| } | ||
| for svcIP, entry := range keep { | ||
| if err := p.EnsurePortFilter(svcIP, entry.PodIP, entry.Ports); err != nil { | ||
| return fmt.Errorf("cleanup ensure %s: %v", svcIP, err) | ||
| } | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
The CleanupPortFilters method is inefficient because it performs multiple ruleset flushes and redundant netlink queries. It calls DeletePortFilter and EnsurePortFilter within loops, each of which triggers a Flush(). Additionally, removeAllowedPortsForPod (called by both) retrieves all elements of the allowed_ports set every time it is invoked. For a large number of services, this results in
Consider optimizing this by:
- Fetching all elements of
filtered_podsandallowed_portsonce at the start of the cleanup. - Calculating the set of elements to be added or removed in memory.
- Batching all
SetAddElementsandSetDeleteElementscalls. - Performing a single
Flush()at the end of the reconciliation process.
There was a problem hiding this comment.
Fixed in 460c757. Rewrote CleanupPortFilters as a single-pass diff: fetch each set's elements once, compute additions and removals in memory (using a hex string key for the concat-set tuples), batch SetAddElements / SetDeleteElements per set, and Flush exactly once at the end. The previous loop ran GetSetElements + Flush for every entry which was O(N^2) in services × ports.
Port-filter state was previously only updated on Service events. When an Endpoint changed (pod added / removed / IP changed) without a concurrent Service event, the firewall could remain stale until the next Service update. Mirror the EnsureRules / DeleteRules invocations in addEndpointFunc, updateEndpointFunc, and deleteEndpointFunc so that the port-filter set tracks endpoint changes in real time. Addresses gemini-code-assist review feedback on PR cozystack#11. Signed-off-by: mattia-eleuteri <mattia@hidora.io>
EnsurePortFilter previously called SetAddElements once per port; accumulate elements and issue a single call. CleanupPortFilters previously delegated to DeletePortFilter / EnsurePortFilter in a loop, each performing GetSetElements and Flush — O(N^2) in number of services * ports. Replace with a single-pass diff: fetch current state once per set, compute additions and removals in memory, batch SetAddElements / SetDeleteElements per set, and Flush exactly once. Addresses gemini-code-assist review feedback on PR cozystack#11. Signed-off-by: mattia-eleuteri <mattia@hidora.io>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
pkg/controllers/services_controller_test.go (1)
14-53: Consider pinning down edge-case annotation values.The current implementation treats
""(empty value),"FALSE"(uppercase), and any non-"false"literal as passthrough-enabled and managed. None are exercised here. Locking these behaviors in tests prevents accidental regressions if the predicate is ever rewritten (e.g., to astrings.EqualFoldcomparison).♻️ Proposed additional cases
cases := []struct { name string svc *v1.Service expect bool }{ {"true value", svcWith(map[string]string{"networking.cozystack.io/wholeIP": "true"}), true}, {"false value", svcWith(map[string]string{"networking.cozystack.io/wholeIP": "false"}), false}, {"absent annotation defaults to passthrough", svcWith(map[string]string{}), true}, {"nil annotations defaults to passthrough", &v1.Service{}, true}, + {"empty value defaults to passthrough", svcWith(map[string]string{"networking.cozystack.io/wholeIP": ""}), true}, + {"uppercase FALSE defaults to passthrough (case-sensitive)", svcWith(map[string]string{"networking.cozystack.io/wholeIP": "FALSE"}), true}, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controllers/services_controller_test.go` around lines 14 - 53, Add edge-case tests for annotation parsing in TestHasWholeIPAnnotation and TestWholeIPPassthrough: exercise an empty-string value (""), an uppercase "FALSE" (e.g., "FALSE"), and a non-"false" arbitrary literal (e.g., "nope") to lock current behavior of hasWholeIPAnnotation and wholeIPPassthrough; update the test case tables to include entries asserting hasWholeIPAnnotation treats "" and "FALSE" and "nope" the same as the existing logic, and assert wholeIPPassthrough returns the expected booleans for these values so future changes to the predicate implementations are caught.pkg/proxy/nft.go (1)
599-612: Optional: defend against out-of-range port values.
sp.Portisint32; the cast touint16truncates silently. Kubernetes API validation rejects ports outside 1–65535 on admission, so this is safe in practice — but a guard avoids a subtle silent corruption if the validator is ever bypassed (e.g., when the controller is fed objects from a non-API source in tests).🛡️ Proposed guard
for _, sp := range ports { + if sp.Port < 1 || sp.Port > 65535 { + log.Info("Skipping out-of-range port for port filter", + "podIP", podIP, "port", sp.Port) + continue + } var protoByte byte switch sp.Protocol {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/proxy/nft.go` around lines 599 - 612, Loop currently casts sp.Port (int32) to uint16 which can truncate invalid values; add a guard in the ports loop (before using concatPortKey) that verifies sp.Port is within 1..65535, skip and log an informative message if out of range, and only then convert to uint16 for concatPortKey and nftables.SetElement creation; reference the ports iteration, sp.Port, protoByte, parsedPodIP, concatPortKey, and nftables.SetElement when adding the check.pkg/controllers/services_controller.go (1)
234-247: Inconsistent error handling between NAT and port-filter calls.The new port-filter operations log their errors (lines 240–246), but the adjacent
c.Proxy.EnsureRules(svcIP, podIP)call drops its return value silently. golangci-lint's errcheck flags this on lines 237, 273, 295, 311, 350, 388. While this is a pre-existing pattern, the diff already touches these blocks, so it's a low-cost moment to make the handling uniform — a failedEnsureRulesfollowed by a "successful"EnsurePortFilterlog line is misleading during incident triage.♻️ Proposed fix (apply at all six errcheck sites)
- c.Proxy.EnsureRules(svcIP, podIP) + if err := c.Proxy.EnsureRules(svcIP, podIP); err != nil { + log.Error(err, "failed to ensure NAT rules", "svcIP", svcIP, "podIP", podIP) + } // Apply (or remove) port filtering depending on annotation value.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controllers/services_controller.go` around lines 234 - 247, The call to c.Proxy.EnsureRules(svcIP, podIP) currently ignores its error; capture its returned error and handle it the same way as the port-filter calls by checking if err != nil and calling log.Error(err, "failed to ensure NAT mapping rules", "svcIP", svcIP, "podIP", podIP); apply this same change for all occurrences of EnsureRules flagged by errcheck (e.g., the other blocks that mirror this pattern) so error reporting is consistent with DeletePortFilter/EnsurePortFilter and wholeIPPassthrough logic.pkg/proxy/interface.go (1)
1-9: Optional: avoid leakingcorev1.ServicePortinto the proxy abstraction.The proxy interface previously was IP-only and Kubernetes-agnostic. Importing
corev1here couples the NAT abstraction to the k8s API just to convey(protocol, port)tuples. A small proxy-local type (e.g.,type PortSpec struct { Protocol string; Port uint16 }) would keep the layering clean and let the controller do the conversion. Defer if intentional.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/proxy/interface.go` around lines 1 - 9, Replace the Kubernetes API type leak in the proxy abstraction by removing the corev1.ServicePort usage and introducing a proxy-local PortSpec type (e.g., type PortSpec struct { Protocol string; Port uint16 }) in the same file; update the ProxyProcessor interface methods that currently carry service port info (EnsureRules, DeleteRules and/or CleanupRules) to accept []PortSpec (or the minimal collection you need) instead of corev1.ServicePort, remove the corev1 import, and make the controller/convertor layer translate corev1.ServicePort -> PortSpec before calling ProxyProcessor methods.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/proxy/interface.go`:
- Around line 11-21: The docstring for EnsurePortFilter claims an empty Ports
disables filtering, but the EnsurePortFilter implementation adds the pod to
filtered_pods with an empty allowed_ports which drops all traffic; fix this by
updating the EnsurePortFilter implementation (in nft.go) to short-circuit when
len(Ports)==0: call DeletePortFilter(SvcIP, PodIP) and return its result (or nil
after successful removal) instead of installing an empty rule set; reference the
EnsurePortFilter and DeletePortFilter symbols so you can locate and change the
early-exit behavior.
In `@pkg/proxy/nft.go`:
- Around line 137-149: The comments referencing early_snat are stale after the
chain split—update the comment block around the port_filter chain creation (the
code that sets p.portFilterCh via p.conn.AddChain for the "port_filter"
nftables.Chain) to say that the destination address (daddr) is rewritten by
ingress_dnat at mangle (-150) rather than early_snat, and adjust the priority
note from (-300) to (-150); also do a project-wide sweep to replace other
early_snat mentions (e.g. filtered_pods comment, the drop-rule comment, and the
EnsurePortFilter docstring in pkg/proxy/interface.go) with ingress_dnat and fix
any priority numeric references accordingly.
---
Nitpick comments:
In `@pkg/controllers/services_controller_test.go`:
- Around line 14-53: Add edge-case tests for annotation parsing in
TestHasWholeIPAnnotation and TestWholeIPPassthrough: exercise an empty-string
value (""), an uppercase "FALSE" (e.g., "FALSE"), and a non-"false" arbitrary
literal (e.g., "nope") to lock current behavior of hasWholeIPAnnotation and
wholeIPPassthrough; update the test case tables to include entries asserting
hasWholeIPAnnotation treats "" and "FALSE" and "nope" the same as the existing
logic, and assert wholeIPPassthrough returns the expected booleans for these
values so future changes to the predicate implementations are caught.
In `@pkg/controllers/services_controller.go`:
- Around line 234-247: The call to c.Proxy.EnsureRules(svcIP, podIP) currently
ignores its error; capture its returned error and handle it the same way as the
port-filter calls by checking if err != nil and calling log.Error(err, "failed
to ensure NAT mapping rules", "svcIP", svcIP, "podIP", podIP); apply this same
change for all occurrences of EnsureRules flagged by errcheck (e.g., the other
blocks that mirror this pattern) so error reporting is consistent with
DeletePortFilter/EnsurePortFilter and wholeIPPassthrough logic.
In `@pkg/proxy/interface.go`:
- Around line 1-9: Replace the Kubernetes API type leak in the proxy abstraction
by removing the corev1.ServicePort usage and introducing a proxy-local PortSpec
type (e.g., type PortSpec struct { Protocol string; Port uint16 }) in the same
file; update the ProxyProcessor interface methods that currently carry service
port info (EnsureRules, DeleteRules and/or CleanupRules) to accept []PortSpec
(or the minimal collection you need) instead of corev1.ServicePort, remove the
corev1 import, and make the controller/convertor layer translate
corev1.ServicePort -> PortSpec before calling ProxyProcessor methods.
In `@pkg/proxy/nft.go`:
- Around line 599-612: Loop currently casts sp.Port (int32) to uint16 which can
truncate invalid values; add a guard in the ports loop (before using
concatPortKey) that verifies sp.Port is within 1..65535, skip and log an
informative message if out of range, and only then convert to uint16 for
concatPortKey and nftables.SetElement creation; reference the ports iteration,
sp.Port, protoByte, parsedPodIP, concatPortKey, and nftables.SetElement when
adding the check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f81abce3-519b-495c-b92e-19f01367cda9
📒 Files selected for processing (7)
README.mdpkg/controllers/services_controller.gopkg/controllers/services_controller_test.gopkg/proxy/dummy.gopkg/proxy/interface.gopkg/proxy/nft.gopkg/proxy/nft_test.go
The interface docstring documents EnsurePortFilter with an empty Ports slice as equivalent to DeletePortFilter (i.e. disable filtering for this pod). The implementation, however, added the pod to filtered_pods without populating allowed_ports — which made every ingress packet to the pod IP match the drop rule, the opposite of "no filter". Make the implementation match the contract: if Ports is empty, delegate to DeletePortFilter and return. Addresses coderabbitai review feedback on PR cozystack#11. Signed-off-by: mattia-eleuteri <mattia@hidora.io>
After the early_snat chain was split into egress_snat (raw, saddr rewrite) and ingress_dnat (mangle, daddr rewrite), several comments still attributed the post-DNAT pod IP to early_snat. Rename the references in comments and the EnsurePortFilter docstring to ingress_dnat with the correct priority (-150). The egress_snat (saddr, priority raw) references are left unchanged where accurate. Addresses coderabbitai review feedback on PR cozystack#11. Signed-off-by: mattia-eleuteri <mattia@hidora.io>
kvaps
left a comment
There was a problem hiding this comment.
LGTM. Verified the datapath split (egress_snat raw → conntrack → ingress_dnat mangle → port_filter filter), the concat-set construction, and that the empty-Spec.Ports short-circuit lands as DeletePortFilter rather than a black hole. Tests for the predicates and concat key layout look good.
The one gap I flagged in review — ICMP being silently dropped in port-filter mode (breaks ping, PMTU discovery, ICMP unreachable) — is addressed by the stacked follow-up at mattia-eleuteri#1, which I'll merge right after this lands.
- Add `externalAllowICMP` value (default true) propagated as `networking.cozystack.io/allowICMP` annotation on the rendered Service when `externalMethod: PortList`. The cozy-proxy companion (released as part of cozystack/cozy-proxy#11 + cozystack#12) drops ICMP by default in port-filter mode, which breaks ping and PMTU discovery; defaulting the chart to "true" preserves user expectations while still allowing operators to opt out by setting `externalAllowICMP: false`. - Remove the v1.3.1.md changelog entry. Project convention is to add changelogs in a dedicated "docs: add changelog for vX.Y.Z" commit at release time, not as part of feature/fix PRs. Signed-off-by: mattia-eleuteri <mattia@hidora.io>
## What this PR does The `vm-instance` chart now drives the cozy-proxy `wholeIP` and `allowICMP` annotations explicitly so that `externalMethod: PortList` actually filters ingress traffic to declared ports while keeping ping/PMTU functional. - Render `networking.cozystack.io/wholeIP: "false"` when `externalMethod: PortList` (was always `"true"`, which silently disabled the PortList semantics in cozy-proxy). - Add `externalAllowICMP` value (default `true`) propagated as `networking.cozystack.io/allowICMP` when `externalMethod: PortList`. Without this, cozy-proxy drops ICMP in port-filter mode (ping/PMTU broken). Operators can set `externalAllowICMP: false` to opt out. The changelog entry is intentionally **not** part of this PR — it will be added in a dedicated `docs: add changelog for vX.Y.Z` commit at release time, per project convention. ## Why `externalMethod: PortList` is documented as filtering ingress traffic to declared ports but has been non-functional on Cozystack v1.3.0 — verified empirically on a 3-node Talos lab. Root cause was twofold: chart always set `wholeIP: "true"`, and cozy-proxy v0.2.0 had no port-aware logic. The cozy-proxy side was fixed in cozystack/cozy-proxy#11 (merged) and cozystack/cozy-proxy#12 (allowICMP follow-up); this PR completes the user-visible fix on the chart side. ## Companion PRs - cozystack/cozy-proxy#11 (merged) — per-service ingress port filtering - cozystack/cozy-proxy#12 (merged) — `allowICMP` annotation for port-filter mode ## Test plan - [x] Built cozy-proxy with the companion fix locally, deployed on a 3-node Talos lab (Cozystack v1.3.0) - [x] `wholeIP: "false"` Service with `spec.ports: [22]`: only port 22 reachable from outside; ports 80/443/8080/9999 filtered - [x] WholeIP-annotated Service unchanged: all listening ports reachable - [x] Egress IP preservation works in both modes (TCP curl + UDP DNS) - [x] `nft list table ip cozy_proxy` confirms expected ruleset - [x] `helm template` renders the expected annotation matrix: `PortList` default → `wholeIP=false, allowICMP=true`; `PortList` opt-out → `allowICMP=false`; `WholeIP` → only `wholeIP=true` - [x] `make unit-tests` passes locally - [ ] CI unit tests - [ ] CI E2E ## Backport Suggesting `backport-v1.3` once merged. ## Release note ```release-note [vm-instance] Make `externalMethod: PortList` actually filter ingress traffic to ports listed in `externalPorts`. New `externalAllowICMP` knob (default true) propagates the cozy-proxy `allowICMP` annotation to keep ping/PMTU functional in port-filter mode. Combined with cozy-proxy v0.3.0+, only listed ports plus ICMP are reachable from the VM's LoadBalancer IP. ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added `externalAllowICMP` configuration option to control ICMP traffic acceptance for VM external access in PortList mode (enabled by default). * **Documentation** * Updated parameter documentation to include the new ICMP traffic control setting. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
What this PR does
Adds an opt-in ingress port filter to
cozy-proxy, so that a Service annotated withnetworking.cozystack.io/wholeIP: "false"only accepts TCP/UDP traffic on the ports listed inService.spec.ports. Traffic to any other port destined to that LB IP is dropped before NAT rewrite. Egress IP preservation is preserved in both modes.The existing wholeIP semantics (annotation
"true"or absent → passthrough) is unchanged, so existing deployments are not affected.Why
The companion chart
vm-instanceincozystack/cozystackships anexternalMethod: PortListmode that's documented as filtering ingress to declared ports. That mode is currently non-functional because cozy-proxy installs an IP-only DNAT (no port awareness) regardless ofService.spec.ports. Verified empirically on Cozystack v1.3.0 — see https://github.com/cozystack/cozystack/issues/ for the user-facing report.How
egress_snatat priorityraw(-300): saddr rewrite viapod_svcmap, runs before conntrack so the recorded tuple has saddr=LB_IP.ingress_dnatat prioritymangle(-150): daddr rewrite viasvc_podmap, runs after conntrack so reply packets of egress flows are matched correctly. This was previously combined with egress_snat in a single chain at raw, which broke conntrack matching for return packets.filter(0):ct state established,related acceptfor stateful pass-through of return packets.ip daddr @filtered_pods ip daddr . meta l4proto . th dport != @allowed_ports dropto enforce the port whitelist for new connections.cozy_proxy:filtered_pods(set of pod IPs, post-DNAT) opted into filtering.allowed_ports(concat set:ipv4_addr . inet_proto . inet_service) listing allowed(podIP, proto, port)tuples.proxy.ProxyProcessor:EnsurePortFilter(svcIP, podIP, ports),DeletePortFilter(svcIP, podIP),CleanupPortFilters(map[string]PortFilterEntry).hasWholeIPAnnotationis broadened to "any value triggers management"; newwholeIPPassthroughreturns false only for"false". Backward-compatible with existing deployments.Test plan
wholeIPPassthroughandhasWholeIPAnnotation(added)cozy-proxy:portfilter-dev(linux/amd64) and deployed on a 3-node Talos lab (Cozystack v1.3.0, Cilium 1.19)wholeIP: "false"withspec.ports: [22]: only port 22 reachable from outside, ports 80/443/8080/9999 filtered (verified with nmap + bash /dev/tcp)nft list table ip cozy_proxyconfirms the expected ruleset (egress_snat raw, ingress_dnat mangle, port_filter filter with stateful accept + per-port drop)Release note
Summary by CodeRabbit
New Features
networking.cozystack.io/wholeIPannotation modes ("true","false", or absent)Documentation