Skip to content

cloudbuild: bump timeout to 3600s#1419

Merged
k8s-ci-robot merged 1 commit into
kubernetes:masterfrom
Ganiredi:cloudbuild-timeout-bump
May 5, 2026
Merged

cloudbuild: bump timeout to 3600s#1419
k8s-ci-robot merged 1 commit into
kubernetes:masterfrom
Ganiredi:cloudbuild-timeout-bump

Conversation

@Ganiredi
Copy link
Copy Markdown
Contributor

@Ganiredi Ganiredi commented May 5, 2026

What type of PR is this?

/kind bug

What this PR does / why we need it:

The cloud-provider-aws-push-images postsubmit has been running up against the 1200s (20 min) Cloud Build timeout. After #1399 fixed the image pin issue, step 1 (multi-arch buildx) now succeeds but consumes 16-19 of the 20 available minutes leaving step 2 (cloudbuild-artifacts) with essentially no budget. Step 2 is killed mid-hack/install-gsutil.sh, so no ecr-credential-provider binaries land in gs://k8s-staging-provider-aws/releases/.

Observed failures after #1399 landed:

Cross-check: the last successful binary upload to the staging releases bucket was v1.35.1-5-g7dac1f6 on 2026-03-27

  • step 2 has effectively been timing out for over a month, independent of the image pin issue. Staging image pushes continued through 2026-04-08 before the image GC killed step 1 too.

Bumping the overall build timeout to 3600s (60 min) gives:

  • Headroom for GCB pool variability
  • Reliable multi-arch buildx pushes on step 1
  • Enough budget for the existing gsutil install + upload on step 2

Which issue(s) this PR fixes:

N/A — depends on #1399 which already landed.

Special notes for your reviewer:

No behavior change for successful builds; only raises the ceiling for slow ones. Will need cherry-picks to release-1.36 (and older release branches) once this lands so that v1.36.0 / v1.35.x / etc. postsubmits can also succeed.

Does this PR introduce a user-facing change?:

NONE

The cloud-provider-aws-push-images postsubmit has been running up
against the 1200s (20 min) Cloud Build timeout:

- The new gcb-docker-gcloud image (pinned in kubernetes#1399) is larger and
  pushes/pulls take slightly longer, and GCB's shared pool has been
  slower overall, so step 1 (multi-arch buildx) routinely now uses
  16-19 of the 20 available minutes.
- Step 2 (cloudbuild-artifacts) then has almost no budget left and
  is killed mid-`hack/install-gsutil.sh`, which means the
  ecr-credential-provider binaries never reach
  gs://k8s-staging-provider-aws/releases/.

Observed failures after kubernetes#1399 landed:
- release-1.36:
  https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/cloud-provider-aws-push-images/2051718180123971584
  (step 1 ok, step 2 TIMEOUT installing gsutil)
- master:
  https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/cloud-provider-aws-push-images/2051514822092132352
  (step 1 buildkit session deadline during registry push)

Cross-check: the last successful binary upload to the staging
releases bucket was v1.35.1-5-g7dac1f6 on 2026-03-27 — step 2 has
effectively been timing out for over a month, independent of the
image pin issue.

Bumping the overall build timeout to 3600s (60 min) gives headroom
for GCB variability, keeps step 1 multi-arch pushes reliable, and
leaves enough budget for the existing install-gsutil + upload flow.
A follow-up PR can skip the SDK reinstall by switching the step 2
base image.

Signed-off-by: Ganesh Putta <ganiredi@amazon.com>
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 5, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot requested review from cheftako and elmiko May 5, 2026 18:31
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 5, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @Ganiredi. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 5, 2026
@kmala
Copy link
Copy Markdown
Member

kmala commented May 5, 2026

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 5, 2026
@kmala
Copy link
Copy Markdown
Member

kmala commented May 5, 2026

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 5, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kmala

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 5, 2026
@k8s-ci-robot k8s-ci-robot merged commit 9514aec into kubernetes:master May 5, 2026
11 checks passed
k8s-ci-robot added a commit that referenced this pull request May 6, 2026
…-upstream-release-1.36

Automated cherry pick of #1419: cloudbuild: bump timeout to 3600s
This was referenced May 6, 2026
k8s-ci-robot added a commit that referenced this pull request May 7, 2026
…1419-#1421-upstream-release-1.30

Automated cherry pick of #1399: cloudbuild: bump gcb-docker-gcloud to v20260205-38cfa9523f
#1419: cloudbuild: bump timeout to 3600s
#1421: cloudbuild: upgrade to E2_HIGHCPU_32 to fix session timeout
k8s-ci-robot added a commit that referenced this pull request May 7, 2026
…1419-#1421-upstream-release-1.34

Automated cherry pick of #1399: cloudbuild: bump gcb-docker-gcloud to v20260205-38cfa9523f
#1419: cloudbuild: bump timeout to 3600s
#1421: cloudbuild: upgrade to E2_HIGHCPU_32 to fix session timeout
k8s-ci-robot added a commit that referenced this pull request May 7, 2026
…1419-#1421-upstream-release-1.33

Automated cherry pick of #1399: cloudbuild: bump gcb-docker-gcloud to v20260205-38cfa9523f
#1419: cloudbuild: bump timeout to 3600s
#1421: cloudbuild: upgrade to E2_HIGHCPU_32 to fix session timeout
k8s-ci-robot added a commit that referenced this pull request May 7, 2026
…1419-#1421-upstream-release-1.35

Automated cherry pick of #1399: cloudbuild: bump gcb-docker-gcloud to v20260205-38cfa9523f
#1419: cloudbuild: bump timeout to 3600s
#1421: cloudbuild: upgrade to E2_HIGHCPU_32 to fix session timeout
k8s-ci-robot added a commit that referenced this pull request May 7, 2026
…1419-#1421-upstream-release-1.29

Automated cherry pick of #1399: cloudbuild: bump gcb-docker-gcloud to v20260205-38cfa9523f
#1419: cloudbuild: bump timeout to 3600s
#1421: cloudbuild: upgrade to E2_HIGHCPU_32 to fix session timeout
k8s-ci-robot added a commit that referenced this pull request May 7, 2026
…1419-#1421-upstream-release-1.31

Automated cherry pick of #1399: cloudbuild: bump gcb-docker-gcloud to v20260205-38cfa9523f
#1419: cloudbuild: bump timeout to 3600s
#1421: cloudbuild: upgrade to E2_HIGHCPU_32 to fix session timeout
k8s-ci-robot added a commit that referenced this pull request May 7, 2026
…1419-#1421-upstream-release-1.32

Automated cherry pick of #1399: cloudbuild: bump gcb-docker-gcloud to v20260205-38cfa9523f
#1419: cloudbuild: bump timeout to 3600s
#1421: cloudbuild: upgrade to E2_HIGHCPU_32 to fix session timeout
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants