Skip to content

Conversation

@weyfonk
Copy link
Contributor

@weyfonk weyfonk commented Jan 9, 2026

When creating a cluster import token fails, the agent management controller is no longer vulnerable to crashes.

Refers to #4491.
Backport of #4494 to v0.14.

Additional Information

Checklist

- [ ] I have updated the documentation via a pull request in the
fleet-docs repository.

@weyfonk weyfonk requested a review from a team as a code owner January 9, 2026 08:49
@weyfonk weyfonk added this to the v2.13.2 milestone Jan 9, 2026
@weyfonk weyfonk added this to Fleet Jan 9, 2026
@weyfonk weyfonk moved this to 👀 In review in Fleet Jan 9, 2026
When creating a cluster import token fails, the agent management
controller is no longer vulnerable to crashes.
@weyfonk weyfonk force-pushed the 0.14-import-token-fix-nil-dereference branch from b09efcc to dc70e8f Compare January 9, 2026 09:08
Comment on lines +378 to +391
if err != nil {
if apierrors.IsAlreadyExists(err) {
token, err = i.tokens.Get(cluster.Namespace, tokenName)
if err != nil {
logrus.Debugf("Failed to get existing ClusterRegistrationToken for cluster %s/%s: %v (requeuing)", cluster.Namespace, cluster.Name, err)
i.clusters.EnqueueAfter(cluster.Namespace, cluster.Name, durations.TokenClusterEnqueueDelay)
return status, err
}
} else {
logrus.Debugf("Failed to create ClusterRegistrationToken for cluster %s/%s: %v (requeuing)", cluster.Namespace, cluster.Name, err)
i.clusters.EnqueueAfter(cluster.Namespace, cluster.Name, durations.TokenClusterEnqueueDelay)
return status, err
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick about style (I don't like too many nested if-else):

Suggested change
if err != nil {
if apierrors.IsAlreadyExists(err) {
token, err = i.tokens.Get(cluster.Namespace, tokenName)
if err != nil {
logrus.Debugf("Failed to get existing ClusterRegistrationToken for cluster %s/%s: %v (requeuing)", cluster.Namespace, cluster.Name, err)
i.clusters.EnqueueAfter(cluster.Namespace, cluster.Name, durations.TokenClusterEnqueueDelay)
return status, err
}
} else {
logrus.Debugf("Failed to create ClusterRegistrationToken for cluster %s/%s: %v (requeuing)", cluster.Namespace, cluster.Name, err)
i.clusters.EnqueueAfter(cluster.Namespace, cluster.Name, durations.TokenClusterEnqueueDelay)
return status, err
}
}
if apierrors.IsAlreadyExists(err) {
token, err = i.tokens.Get(cluster.Namespace, tokenName)
if err != nil {
logrus.Debugf("Failed to get existing ClusterRegistrationToken for cluster %s/%s: %v (requeuing)", cluster.Namespace, cluster.Name, err)
i.clusters.EnqueueAfter(cluster.Namespace, cluster.Name, durations.TokenClusterEnqueueDelay)
return status, err
}
} else if err != nil {
logrus.Debugf("Failed to create ClusterRegistrationToken for cluster %s/%s: %v (requeuing)", cluster.Namespace, cluster.Name, err)
i.clusters.EnqueueAfter(cluster.Namespace, cluster.Name, durations.TokenClusterEnqueueDelay)
return status, err
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that this is a backport and not the original PR, nevermind then 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that the style is a bit less than ideal, but this is indeed a backport :)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes a critical nil pointer dereference bug in the cluster import token creation logic. When token creation failed, the previous code discarded the result and continued execution with a nil token variable, causing crashes when the token was later accessed.

Key Changes:

  • Properly capture the created token object instead of discarding it
  • Add comprehensive error handling for token creation failures, including race condition handling for AlreadyExists errors
  • Return errors appropriately instead of silently ignoring them and continuing with nil token

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@thardeck thardeck merged commit afb2958 into rancher:release/v0.14 Jan 9, 2026
28 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Fleet Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants