Automated cherry pick of #1399: cloudbuild: bump gcb-docker-gcloud to v20260205-38cfa9523f #1419: cloudbuild: bump timeout to 3600s #1421: cloudbuild: upgrade to E2_HIGHCPU_32 to fix session timeout#1429
Conversation
The cloud-provider-aws-push-images postsubmit has been failing across all branches (master and release-*) since early April 2026 because cloudbuild.yaml pins gcr.io/k8s-staging-test-infra/gcb-docker-gcloud:v20221214-1b4dd4d69a, and that tag has been garbage-collected out of the staging registry. Cloud Build retries the pull 10 times and fails with 'manifest unknown: Failed to fetch "v20221214-1b4dd4d69a"'. Example failed run (v1.36.0 tag): https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/cloud-provider-aws-push-images/2051449456091467776 Bumping to v20260205-38cfa9523f (digest sha256:ff388e0dc16351e96f8464e2e185b74a7578a5ccb7a112cf3393468e59e6e2d2), currently the newest tag in gcr.io/k8s-staging-test-infra/gcb-docker-gcloud and aliased to 'latest'. This image still provides /buildx-entrypoint used by the build step. Signed-off-by: Ganesh Putta <ganiredi@amazon.com>
The cloud-provider-aws-push-images postsubmit has been running up against the 1200s (20 min) Cloud Build timeout: - The new gcb-docker-gcloud image (pinned in kubernetes#1399) is larger and pushes/pulls take slightly longer, and GCB's shared pool has been slower overall, so step 1 (multi-arch buildx) routinely now uses 16-19 of the 20 available minutes. - Step 2 (cloudbuild-artifacts) then has almost no budget left and is killed mid-`hack/install-gsutil.sh`, which means the ecr-credential-provider binaries never reach gs://k8s-staging-provider-aws/releases/. Observed failures after kubernetes#1399 landed: - release-1.36: https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/cloud-provider-aws-push-images/2051718180123971584 (step 1 ok, step 2 TIMEOUT installing gsutil) - master: https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/cloud-provider-aws-push-images/2051514822092132352 (step 1 buildkit session deadline during registry push) Cross-check: the last successful binary upload to the staging releases bucket was v1.35.1-5-g7dac1f6 on 2026-03-27 — step 2 has effectively been timing out for over a month, independent of the image pin issue. Bumping the overall build timeout to 3600s (60 min) gives headroom for GCB variability, keeps step 1 multi-arch pushes reliable, and leaves enough budget for the existing install-gsutil + upload flow. A follow-up PR can skip the SDK reinstall by switching the step 2 base image. Signed-off-by: Ganesh Putta <ganiredi@amazon.com>
Multi-arch builds (linux/amd64 + linux/arm64) take ~12 min on N1_HIGHCPU_8, causing the BuildKit session to expire before the push phase completes: error: no active session: context deadline exceeded Other K8s repos with large multi-arch binaries (aws-ebs-csi-driver, cloud-provider-gcp) use E2_HIGHCPU_32 which completes builds in ~3-5 min, well within the session window.
|
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
This issue is currently awaiting triage. If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Hi @Ganiredi. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
|
@Ganiredi: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Cherry pick of #1399 #1419 #1421 on release-1.28.
#1399: cloudbuild: bump gcb-docker-gcloud to v20260205-38cfa9523f
#1419: cloudbuild: bump timeout to 3600s
#1421: cloudbuild: upgrade to E2_HIGHCPU_32 to fix session timeout
For details on the cherry pick process, see the cherry pick requests page.