Skip to content

Fix: Move share cleanup to CleanupBeforeVPCDeletion phase to unblock VPC deletion#1272

Merged
zhengxiexie merged 1 commit intovmware-tanzu:mainfrom
zhengxiexie:topic/zhengxie/main/fix_3603104
Oct 29, 2025
Merged

Fix: Move share cleanup to CleanupBeforeVPCDeletion phase to unblock VPC deletion#1272
zhengxiexie merged 1 commit intovmware-tanzu:mainfrom
zhengxiexie:topic/zhengxie/main/fix_3603104

Conversation

@zhengxiexie
Copy link
Copy Markdown
Contributor

@zhengxiexie zhengxiexie commented Oct 24, 2025

Problem

Bug #3603104 - Supervisor disable operations are blocked due to VPC deletion failures. VPCs cannot be deleted because share resources still reference them, causing NSX error 500030.

Root Cause

Share resources are cleaned up in the CleanupInfraResources phase, which runs AFTER VPC deletion attempts. This creates a dependency deadlock where:

  • VPCs can't be deleted because shares reference them
  • Shares aren't deleted until after VPC deletion (which never succeeds)

Solution

Move all share cleanup operations to the CleanupBeforeVPCDeletion phase to ensure shares are deleted before any VPC deletion attempts.

Changes Made

1. SecurityPolicyService (pkg/nsx/services/securitypolicy/cleanup.go)

✅ Added new CleanupBeforeVPCDeletion() method that cleans both project and infra shares
✅ Modified CleanupInfraResources() to remove share cleanup (now only handles groups)

2. LBInfraCleaner (pkg/clean/clean_lb_infra.go)

✅ Added new CleanupBeforeVPCDeletion() method that cleans LB-related shares and shared resources
✅ Modified CleanupInfraResources() to remove share cleanup calls

Testing & Verification

Test Environment Setup

Created SecurityPolicy and NetworkPolicy resources that generate share resources with VPC references:

SecurityPolicy Configuration
apiVersion: crd.nsx.vmware.com/v1alpha1
kind: SecurityPolicy
metadata:
  name: sp-app-access
  namespace: ns-1
spec:
  appliedTo:
  - podSelector:
      matchLabels:
        role: db
  priority: 1
  rules:
  - action: allow
    direction: in
    ports:
    - port: 6001
      protocol: TCP
    - endPort: 7010
      port: 7000
      protocol: TCP
    sources:
    - podSelector:
        matchLabels:
          app: coffee
    - namespaceSelector:
        matchLabels:
          ns-name: ns-2
      podSelector:
        matchLabels:
          app: tea
    - ipBlocks:
      - cidr: 172.26.0.8/29
NetworkPolicy Configuration
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: np-ingress-egress-allow-all-ns
  namespace: ns-1
spec:
  podSelector:
    matchLabels:
      app: tea
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          app: tea
    - namespaceSelector:
        matchLabels:
          app: coffee
    ports:
    - protocol: TCP
  egress:
  - to:
    - podSelector: {}
    ports:
    - protocol: UDP

NSX Resource Analysis

Generated Share Resource (Before Fix)

{
  "sharedWith": ["/orgs/default/projects/project-quality/vpcs/ns-1_nydws"],  // ← VPC reference blocks deletion
  "sharing_strategy": "NONE_DESCENDANTS",
  "resource_type": "Share",
  "id": "project-quality_group_sp-app-access-4a704a42-src_s0rop_share",
  "display_name": "project-quality_group_sp-app-access_4a704a42_src_share",
  "tags": [
    {"scope": "nsx-op/cluster", "tag": "ddc98cb1-5506-47b3-9197-55cded09f5ce"},
    {"scope": "nsx-op/namespace", "tag": "ns-1"},
    {"scope": "nsx-op/security_policy_name", "tag": "sp-app-access"}
  ],
  "path": "/orgs/default/projects/project-quality/infra/shares/project-quality_group_sp-app-access-4a704a42-src_s0rop_share"
}

Validation with NSX Clean Tool

Executed clean operation with PR #1275 modifications:

./clean -cluster="ddc98cb1-5506-47b3-9197-55cded09f5ce" \
  -thumbprint="74:BB:D2:FF:E6:FE:5B:84:5B:74:C9:0F:BD:76:2C:80:7E:12:C5:BE" \
  -log-level=2 \
  -vc-endpoint="lvn-dvm-10-70-184-25.dvm.lvn.broadcom.net" \
  -mgr-ip="10.70.191.102" \
  --target-namespace=ns-1 \
  --target-vpc=ns-1_nydws

Results

Share Cleanup (New Phase - CleanupBeforeVPCDeletion)

2025-10-28 05:06:59.000 INFO cleanup.go:215 Marking share for deletion path=/orgs/default/projects/project-quality/infra/shares/project-quality_group_sp-app-access-4a704a42-src_61k36_share
2025-10-28 05:06:59.000 INFO policy_tree.go:547 Successfully deleted resource resourceID=project-quality_group_sp-app-access-4a704a42-src_61k36_share

VPC Deletion (Now Successful)

2025-10-28 05:37:08.000 INFO clean_service.go:147 Attempting to delete VPC vpcID=ns-1_nydws
2025-10-28 05:37:09.000 INFO vpc.go:244 Successfully deleted NSX VPC VPC=ns-1_nydws
2025-10-28 05:37:09.000 INFO clean_service.go:147 Successfully deleted VPC vpcID=ns-1_nydws

Associated with Precreated VPC in default Project

{
  "id": "pre-created-vpc",
  "display_name": "pre-created-vpc",
  "private_ips": ["10.0.0.0/16"],
  "vpc_service_profile": "/orgs/default/projects/default/vpc-service-profiles/default",
  "load_balancer_vpc_endpoint": {
    "enabled": false
  },
  "ip_address_type": "IPV4",
  "short_id": "OgcKIC0E",
  "disable_route_advertisement": false,
  "_protection": "NOT_PROTECTED"
}
│ apiVersion: crd.nsx.vmware.com/v1alpha1                                                                                                                                                                                                                                                                                     │
│ kind: VPCNetworkConfiguration                                                                                                                                                                                                                                                                                               │
│ metadata:                                                                                                                                                                                                                                                                                                                   │
│   creationTimestamp: "2025-10-28T06:34:22Z"                                                                                                                                                                                                                                                                                 │
│   generation: 1                                                                                                                                                                                                                                                                                                             │
│   name: ns-3-c5c18b5b-d14d-4317-97d8-29e676688fe2                                                                                                                                                                                                                                                                           │
│   resourceVersion: "967626"                                                                                                                                                                                                                                                                                                 │
│   uid: d34da266-9551-4949-a1f8-1f86bbebb5dd                                                                                                                                                                                                                                                                                 │
│ spec:                                                                                                                                                                                                                                                                                                                       │
│   defaultSubnetSize: 64                                                                                                                                                                                                                                                                                                     │
│   nsxProject: /orgs/default/projects/default                                                                                                                                                                                                                                                                                │
│   vpc: /orgs/default/projects/default/vpcs/pre-created-vpc                                                                                                                                                                                                                                                                  │
│ status:                                                                                                                                                                                                                                                                                                                     │
│   vpcs:                                                                                                                                                                                                                                                                                                                     │
│   - name: pre-created-vpc                                                                                                                                                                                                                                                                                                   │
│     vpcPath: /orgs/default/projects/default/vpcs/pre-created-vpc 

Security policy can be deleted successfully.

2025-10-28 06:46:21.000 INFO cleanup.go:247 Marking group for deletion path=/orgs/default/projects/default/vpcs/pre-created-vpc/groups/sp-app-access-scope_77c8e
2025-10-28 06:46:21.000 TRACE utils.go:356 HTTP req body={"children":[{"children":[{"children":[{"children":[{"Group":{"_create_time":1761633920280,"_create_user":"wcp-cluster-user-ddc98cb1-5506-47b3-9197-55cded09f5ce-f60c897a-d0a4-46c0-a2fb-3f474cd9a42c","_last_modified_time":1761633920280,"_last_modified_user":"wcp-cluster-user-ddc98cb1-5506-47b3-9197-55cded09f5ce-f60c897a-d0a4-46c0-a2fb-3f474cd9a42c","_protection":"NOT_PROTECTED","_revision":0,"_system_owned":false,"display_name":"sp-app-access_scope","expression":[{"expressions":[{"id":"fdba7e24-535a-4d8d-b022-d349b441d777","key":"Tag","marked_for_delete":false,"member_type":"VpcSubnetPort","operator":"EQUALS","overridden":false,"parent_path":"/orgs/default/projects/default/vpcs/pre-created-vpc/groups/sp-app-access-scope_77c8e","path":"/orgs/default/projects/default/vpcs/pre-created-vpc/groups/sp-app-access-scope_77c8e/condition-expressions/fdba7e24-535a-4d8d-b022-d349b441d777","relative_path":"fdba7e24-535a-4d8d-b022-d349b441d777","remote_path":"","resource_type":"Condition","scope_operator":"EQUALS","value":"nsx-op/cluster|ddc98cb1-5506-47b3-9197-55cded09f5ce"},{"conjunction_operator":"AND","id":"6981c6d7-f799-4a78-9909-69cc8a97ed68","marked_for_delete":false,"overridden":false,"parent_path":"/orgs/default/projects/default/vpcs/pre-created-vpc/groups/sp-app-access-scope_77c8e","path":"/orgs/default/projects/default/vpcs/pre-created-vpc/groups/sp-app-access-scope_77c8e/conjunction-expressions/6981c6d7-f799-4a78-9909-69cc8a97ed68","relative_path":"6981c6d7-f799-4a78-9909-69cc8a97ed68","remote_path":"","resource_type":"ConjunctionOperator"},{"id":"59351d86-cd00-4488-a4b0-fc5e953ccf09","key":"Tag","marked_for_delete":false,"member_type":"VpcSubnetPort","operator":"EQUALS","overridden":false,"parent_path":"/orgs/default/projects/default/vpcs/pre-created-vpc/groups/sp-app-access-scope_77c8e","path":"/orgs/default/projects/default/vpcs/pre-created-vpc/groups/sp-app-access-scope_77c8e/condition-expressions/59351d86-cd00-4488-a4b0-fc5e953ccf09","relative_path":"59351d86-cd00-4488-a4b0-fc5e953ccf09","remote_path":"","resource_type":"Condition","scope_operator":"EQUALS","value":"nsx-op/namespace_uid|d1bbe8a1-e000-44bd-aaa4-09a5dd048aab"},{"conjunction_operator":"AND","id":"10fa0cdd-2d60-4816-b965-1a8b89c427f6","marked_for_delete":false,"overridden":false,"parent_path":"/orgs/default/projects/default/vpcs/pre-created-vpc/groups/sp-app-access-scope_77c8e","path":"/orgs/default/projects/default/vpcs/pre-created-vpc/groups/sp-app-access-scope_77c8e/conjunction-expressions/10fa0cdd-2d60-4816-b965-1a8b89c427f6","relative_path":"10fa0cdd-2d60-4816-b965-1a8b89c427f6","remote_path":"","resource_type":"ConjunctionOperator"},{"id":"a9aeb839-b8df-4934-b47e-fbdebceabb83","key":"Tag","marked_for_delete":false,"member_type":"VpcSubnetPort","operator":"EQUALS","overridden":false,"parent_path":"/orgs/default/projects/default/vpcs/pre-created-vpc/groups/sp-app-access-scope_77c8e","path":"/orgs/default/projects/default/vpcs/pre-created-vpc/groups/sp-app-access-scope_77c8e/condition-expressions/a9aeb839-b8df-4934-b47e-fbdebceabb83","relative_path":"a9aeb839-b8df-4934-b47e-fbdebceabb83","remote_path":"","resource_type":"Condition","scope_operator":"EQUALS","value":"role|db"}],"id":"0292b3b5-1990-402e-817a-9c9cb1201cbe","marked_for_delete":false,"overridden":false,"parent_path":"/orgs/default/projects/default/vpcs/pre-created-vpc/groups/sp-app-access-scope_77c8e","path":"/orgs/default/projects/default/vpcs/pre-created-vpc/groups/sp-app-access-scope_77c8e/nested-expressions/0292b3b5-1990-402e-817a-9c9cb1201cbe","relative_path":"0292b3b5-1990-402e-817a-9c9cb1201cbe","remote_path":"","resource_type":"NestedExpression"}],"extended_expression":[],"id":"sp-app-access-scope_77c8e","marked_for_delete":true,"overridden":false,"owner_id":"6b307e52-2ee2-4c05-bc37-fc1efeb238e1","parent_path":"/orgs/default/projects/default/vpcs/pre-created-vpc","path":"/orgs/default/projects/default/vpcs/pre-created-vpc/groups/sp-app-access-scope_77c8e","realization_id":"ad417cab-09a4-470b-aa64-f26bff5c7079","reference":false,"relative_path":"sp-app-access-scope_77c8e","remote_path":"","resource_type":"Group","tags":[{"scope":"nsx-op/group_type","tag":"scope"},{"scope":"nsx-op/selector_hash","tag":"ef76b0a86472d85ee27c5f4c4fb101632358e0dd"},{"scope":"nsx-op/cluster","tag":"ddc98cb1-5506-47b3-9197-55cded09f5ce"},{"scope":"nsx-op/version","tag":"1.0.0"},{"scope":"nsx-op/namespace","tag":"ns-3"},{"scope":"nsx-op/namespace_uid","tag":"d1bbe8a1-e000-44bd-aaa4-09a5dd048aab"},{"scope":"nsx-op/security_policy_name","tag":"sp-app-access"},{"scope":"nsx-op/security_policy_uid","tag":"ce0b60ee-fe14-4141-8926-073aebe127e8"},{"scope":"nsx-op/nsx_share_created_for","tag":"notShared"}],"unique_id":"ad417cab-09a4-470b-aa64-f26bff5c7079"},"id":"sp-app-access-scope_77c8e","marked_for_delete":true,"resource_type":"ChildGroup"}],"id":"pre-created-vpc","resource_type":"ChildResourceReference","target_type":"Vpc"}],"id":"default","resource_type":"ChildResourceReference","target_type":"Project"}],"id":"default","resource_type":"ChildResourceReference","target_type":"Org"}],"resource_type":"OrgRoot"} head={"Accept":["application/json"],"Authorization":["--- CENSORED ---"],"Content-Type":["application/json"],"User-Agent":["vAPI/0.7.0 Go/go1.25.3 (linux; amd64)"],"Vapi-Ctx-Opid":["88253921-d33e-4207-b8af-e4ede304200d"]} method=PATCH url=https://10.70.191.102/policy/api/v1/org-root?enforce_revision_check=false

Test Coverage

SecurityPolicy shares: Successfully cleaned before VPC deletion
NetworkPolicy shares: Verified same cleanup pattern works
Multiple shares per VPC: All shares removed in batch operation
VPC deletion: No longer blocked by share references

Impact Analysis

  • Backwards Compatibility: ✅ Maintains all existing cleanup functionality
  • Performance: ✅ No additional API calls - just reordered existing operations
  • Error Handling: ✅ Preserves existing error propagation and logging
  • Idempotency: ✅ Operations remain idempotent as before

Additional Notes

The fix has been validated against both SecurityPolicy and NetworkPolicy resources that create cross-VPC share references. The cleanup phase reordering ensures proper dependency resolution without introducing any breaking changes to existing cleanup logic.

@zhengxiexie zhengxiexie force-pushed the topic/zhengxie/main/fix_3603104 branch from cf8c1b9 to 480bcba Compare October 24, 2025 09:48
@zhengxiexie zhengxiexie force-pushed the topic/zhengxie/main/fix_3603104 branch 4 times, most recently from 23e4497 to 3886c9f Compare October 27, 2025 07:35
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Oct 27, 2025

Codecov Report

❌ Patch coverage is 2.85714% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.66%. Comparing base (372acca) to head (466e94c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/nsx/services/securitypolicy/cleanup.go 3.57% 27 Missing ⚠️
pkg/clean/clean_lb_infra.go 0.00% 7 Missing ⚠️

❌ Your patch status has failed because the patch coverage (2.85%) is below the target coverage (70.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1272      +/-   ##
==========================================
- Coverage   76.00%   75.66%   -0.35%     
==========================================
  Files         147      147              
  Lines       19858    19872      +14     
==========================================
- Hits        15094    15037      -57     
- Misses       3662     3740      +78     
+ Partials     1102     1095       -7     
Flag Coverage Δ
unit-tests 75.66% <2.85%> (-0.35%) ⬇️
Files with missing lines Coverage Δ
pkg/clean/clean_lb_infra.go 78.85% <0.00%> (-1.08%) ⬇️
pkg/nsx/services/securitypolicy/cleanup.go 28.31% <3.57%> (-49.36%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zhengxiexie
Copy link
Copy Markdown
Contributor Author

zhengxiexie commented Oct 28, 2025

/e2e

…VPC deletion

- Add CleanupBeforeVPCDeletion to SecurityPolicyService and LBInfraCleaner
- Remove share cleanup from CleanupInfraResources phase

Change-Id: Ib6814933c24252eba19483ec3d2abd27fbafa7de
@zhengxiexie zhengxiexie force-pushed the topic/zhengxie/main/fix_3603104 branch from 3886c9f to 466e94c Compare October 29, 2025 03:56
Copy link
Copy Markdown
Contributor

@timdengyun timdengyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhengxiexie
Copy link
Copy Markdown
Contributor Author

/e2e

1 similar comment
@zhengxiexie
Copy link
Copy Markdown
Contributor Author

/e2e

@zhengxiexie zhengxiexie merged commit a8e3d9e into vmware-tanzu:main Oct 29, 2025
2 checks passed
zhengxiexie added a commit to zhengxiexie/nsx-operator-1 that referenced this pull request Oct 31, 2025
…re deletion errors

This PR complements PR vmware-tanzu#1272 by applying the same cleanup sequence fix to LoadBalancer infrastructure.

Problem:
- NSX error 524238 occurs when attempting to delete LB infra shares
- Shares are still being referenced by VPC LB Virtual Servers
- Error: 'The object path=[/infra/shares/.../LBFastUdpProfile] cannot be deleted
  since its members are being referenced by [.../vpc-lb-virtual-servers/kube-dns-lb_53_UDP_5zahh]'

Root Cause:
- LB Virtual Servers were cleaned up in CleanupInfraResources phase (too late)
- Share cleanup attempted before Virtual Servers were deleted
- This caused reference dependency errors from NSX

Solution:
- Move DLB Virtual Server cleanup to CleanupBeforeVPCDeletion phase
- Move LB infra Share cleanup to CleanupBeforeVPCDeletion phase
- Ensures proper deletion order: Virtual Servers → Shares → VPC

Changes:
- Updated CleanupBeforeVPCDeletion() to first delete DLB virtual servers, then shares
- Removed these cleanups from CleanupInfraResources() parallel execution
- Updated test cases to reflect new cleanup sequence

This ensures shares can be successfully deleted without NSX reference errors.

Related: vmware-tanzu#1272
Change-Id: I55fd9aceb9eee3364ab3f051f4fbc07e34d23cd0
zhengxiexie added a commit to zhengxiexie/nsx-operator-1 that referenced this pull request Oct 31, 2025
…re deletion errors

This PR complements PR vmware-tanzu#1272 by applying the same cleanup sequence fix to LoadBalancer infrastructure.

Problem:
- NSX error 524238 occurs when attempting to delete LB infra shares
- Shares are still being referenced by VPC LB Virtual Servers
- Error: 'The object path=[/infra/shares/.../LBFastUdpProfile] cannot be deleted
  since its members are being referenced by [.../vpc-lb-virtual-servers/kube-dns-lb_53_UDP_5zahh]'

Root Cause:
- LB Virtual Servers were cleaned up in CleanupInfraResources phase (too late)
- Share cleanup attempted before Virtual Servers were deleted
- This caused reference dependency errors from NSX

Solution:
- Move DLB Virtual Server cleanup to CleanupBeforeVPCDeletion phase
- Move LB infra Share cleanup to CleanupBeforeVPCDeletion phase
- Ensures proper deletion order: Virtual Servers → Shares → VPC

Changes:
- Updated CleanupBeforeVPCDeletion() to first delete DLB virtual servers, then shares
- Removed these cleanups from CleanupInfraResources() parallel execution
- Updated test cases to reflect new cleanup sequence

This ensures shares can be successfully deleted without NSX reference errors.

Related: vmware-tanzu#1272
Change-Id: I55fd9aceb9eee3364ab3f051f4fbc07e34d23cd0
zhengxiexie added a commit that referenced this pull request Nov 3, 2025
…re deletion errors (#1278)

This PR complements PR #1272 by applying the same cleanup sequence fix to LoadBalancer infrastructure.

Problem:
- NSX error 524238 occurs when attempting to delete LB infra shares
- Shares are still being referenced by VPC LB Virtual Servers
- Error: 'The object path=[/infra/shares/.../LBFastUdpProfile] cannot be deleted
  since its members are being referenced by [.../vpc-lb-virtual-servers/kube-dns-lb_53_UDP_5zahh]'

Root Cause:
- LB Virtual Servers were cleaned up in CleanupInfraResources phase (too late)
- Share cleanup attempted before Virtual Servers were deleted
- This caused reference dependency errors from NSX

Solution:
- Move DLB Virtual Server cleanup to CleanupBeforeVPCDeletion phase
- Move LB infra Share cleanup to CleanupBeforeVPCDeletion phase
- Ensures proper deletion order: Virtual Servers → Shares → VPC

Changes:
- Updated CleanupBeforeVPCDeletion() to first delete DLB virtual servers, then shares
- Removed these cleanups from CleanupInfraResources() parallel execution
- Updated test cases to reflect new cleanup sequence

This ensures shares can be successfully deleted without NSX reference errors.

Related: #1272
Change-Id: I55fd9aceb9eee3364ab3f051f4fbc07e34d23cd0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants