Skip to content

🐛 Gracefully handle not-found when resolving owner references#13510

Open
damdo wants to merge 1 commit intokubernetes-sigs:mainfrom
damdo:fix-gracefully-deal-with-nonresolvable-ownerrefs
Open

🐛 Gracefully handle not-found when resolving owner references#13510
damdo wants to merge 1 commit intokubernetes-sigs:mainfrom
damdo:fix-gracefully-deal-with-nonresolvable-ownerrefs

Conversation

@damdo
Copy link
Copy Markdown
Member

@damdo damdo commented Mar 26, 2026

What this PR does / why we need it:

When a Machine or MachineSet has an owner reference pointing to a MachineSet or MachineDeployment that no longer exists (e.g. got forcefully removed), the controller would return a hard error blocking further reconciliation, e.g. for deleting purposes.

Handle this by treating a NotFound error the same as having no owner (standalone) and return nil instead of an error.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #13405

/area machine
/area machineset

NOTE: As decided during KubeCon, this PR is draft until we audit and confirm this doesn't break any existing assumption in code.

cc. @fabriziopandini @sbueringer

When a Machine or MachineSet has an owner reference pointing to a
MachineSet or MachineDeployment that no longer exists (e.g. during
deletion or garbage collection), the controller would return a hard
error blocking further reconciliation, e.g. for deleting purposes.

Handle this by treating a NotFound error the same as having no owner
(standalone) and return nil instead of an error.
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. area/machine Issues or PRs related to machine lifecycle management area/machineset Issues or PRs related to machinesets labels Mar 26, 2026
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 26, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign fabriziopandini for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 26, 2026
@damdo
Copy link
Copy Markdown
Member Author

damdo commented Mar 26, 2026

/assign @fabriziopandini @sbueringer

@sbueringer sbueringer marked this pull request as ready for review March 31, 2026 12:09
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 31, 2026
@sbueringer
Copy link
Copy Markdown
Member

NOTE: As decided during KubeCon, this PR is draft until we audit and confirm this doesn't break any existing assumption in code.

Let's move the PR out of draft so we can run CI. Let's hold instead

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 31, 2026
@sbueringer
Copy link
Copy Markdown
Member

/test pull-cluster-api-e2e-main-gke


md := &clusterv1.MachineDeployment{}
if err := r.Client.Get(ctx, client.ObjectKey{Namespace: machineSet.Namespace, Name: mdName}, md); err != nil {
if apierrors.IsNotFound(err) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this can lead to panics in reconcileUnhealthyMachines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/machine Issues or PRs related to machine lifecycle management area/machineset Issues or PRs related to machinesets cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Machine never completes deletion when parent MachineSet no longer exists (e.g. force-deleted)

4 participants