Skip to content

Fix the batched describe insance method for instance not found case#1260

Merged
k8s-ci-robot merged 1 commit into
kubernetes:masterfrom
kmala:fix
Sep 24, 2025
Merged

Fix the batched describe insance method for instance not found case#1260
k8s-ci-robot merged 1 commit into
kubernetes:masterfrom
kmala:fix

Conversation

@kmala
Copy link
Copy Markdown
Member

@kmala kmala commented Sep 23, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

Fix the describe instance batcher when the instance is not found or the batch call return error. If the instance is not found we shouldn't be returning an array of empty object but should be returning a nil. This is causing CCM to panic when the nodes deletion in apiserver is blocked because of finalizers but the instance in the aws is deleted. This bug is introduced in the #1152

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Sep 23, 2025
@k8s-ci-robot k8s-ci-robot requested a review from dims September 23, 2025 06:14
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 23, 2025
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 23, 2025
@kmala
Copy link
Copy Markdown
Member Author

kmala commented Sep 23, 2025

/retest

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@kmala: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cloud-provider-aws-e2e-kubetest2 f915f24 link false /test pull-cloud-provider-aws-e2e-kubetest2
pull-cloud-provider-aws-e2e-kubetest2-quick f915f24 link false /test pull-cloud-provider-aws-e2e-kubetest2-quick

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@shiv-amz
Copy link
Copy Markdown
Contributor

/lgtm

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@shiv-amz: changing LGTM is restricted to collaborators

Details

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

results := make([]batcher.Result[ec2types.Instance], len(inputs))
firstInput := inputs[0]

firstInput := *inputs[0]
Copy link
Copy Markdown
Contributor

@sarthakkothari sarthakkothari Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we translate this from a pointer? can potentially cause another panic?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't translate we are overriding it in the below for loop as firstInput would just be a pointer to the first item item in slice. inputs should never have nil items for the panic to happen and there should always be atleast 1 item in the inputs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we still have a len check on inputs before this line to catch the case where inputs is empty? Unless we have a hard guarantee somewhere that inputs is not empty?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we are guaranteed to not have empty list as this gets called only when we make a describe instance call and there it verifies the instance count.

return nil, fmt.Errorf("expected to receive a single instance only, found %d", len(input.InstanceIds))
}
result := b.batcher.Add(ctx, input)
if result.Output == nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is if result.Err != nil a better check here? i wonder whether we're actually getting an error e.g. 404 not found wen describing a non-existent instance, it's trickier when e.g. we batch two describe instance IDs where one exists and the other is non-existent, are we getting an error in this case?

@yue9944882
Copy link
Copy Markdown
Member

I think it's more complex than a nil check. Bundling the describe calls via introducing a batcher did lower down the API counts but it also adds some complexity regarding error handling. i just did a local experiment:

  1. Calling describe API with two valid instance ID -> returns an array of two elements
  2. Calling describe API with 1 valid and 1 non-existent instance ID -> returns 404 error

@sarthakkothari @shiv-amz can you confirm the batcher behavior and add related test cases to that?

@yue9944882
Copy link
Copy Markdown
Member

/lgtm

Discussed with @kmala offline, merge this in as this PR will fix the gap for error handling.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 23, 2025
@yue9944882
Copy link
Copy Markdown
Member

/approve

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yue9944882

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 24, 2025
@kmala
Copy link
Copy Markdown
Member Author

kmala commented Sep 24, 2025

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Sep 24, 2025
@k8s-ci-robot k8s-ci-robot merged commit 36c0ac9 into kubernetes:master Sep 24, 2025
11 of 13 checks passed
@kmala kmala deleted the fix branch September 24, 2025 01:28
k8s-ci-robot added a commit that referenced this pull request Sep 24, 2025
…stream-release-1.34

Automated cherry pick of #1260: Fix the batched describe insance method for instance not
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants