workflows: chunk external-download matrix across 4 parallel invocations#282
workflows: chunk external-download matrix across 4 parallel invocations#282igorpecovnik merged 4 commits intomainfrom
Conversation
…ations The `download` job in infrastructure-download-external.yml was hitting GitHub Actions' 256-entry `strategy.matrix` cap (259 configurations observed). Rather than trimming the matrix (which would drop legitimate arch × release combos users need) or duplicating a 500-line job block, lean on the fact that infrastructure-download-external.yml is *already* a reusable workflow — call it N times from the parent (infrastructure-repository-update.yml) with a `CHUNK_INDEX` / `CHUNK_COUNT` pair, and have the child filter its own matrix to its assigned slice. Child (infrastructure-download-external.yml): - Add CHUNK_INDEX (0..CHUNK_COUNT-1) and CHUNK_COUNT (default 1) inputs. - In the `start` job, after building MATRIX_JSON, slice the include[] list so each invocation keeps only entries where `index % CHUNK_COUNT == CHUNK_INDEX`. Modular slicing (not contiguous ranges) avoids clustering slow package types into one chunk. - Suffix the `assets-for-download` artifact name with CHUNK_INDEX so parallel uploads don't race against each other. Parent (infrastructure-repository-update.yml): - Turn the single `external:` reusable-workflow call into a `strategy.matrix` over `chunk_index: [0, 1, 2, 3]`, passing CHUNK_COUNT: 4 to the child. Effect: 4 parallel invocations, each with its own 256-matrix cap and its own `max-parallel=180`. Headroom 1024 total matrix entries, effective concurrency 720. Scale past that by bumping `chunk_index` list and CHUNK_COUNT in lockstep — no block duplication, no file extraction, no matrix trimming. Legacy/un-chunked callers that omit the chunk inputs get CHUNK_COUNT=1 and receive the entire matrix as before.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 55 minutes and 33 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThe called workflow (.github/workflows/infrastructure-download-external.yml) now accepts Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/infrastructure-download-external.yml:
- Around line 398-404: The placeholder entry with name="none" causes the
workflow to unconditionally source os/external/${{ matrix.name }}.conf later and
fail; change the job or the step that sources that file to skip when matrix.name
== "none" (e.g., add a job-level or step-level guard using the matrix value so
the job/step only runs if matrix.name != "none"), and keep the
MATRIX_JSON_COMPACTED placeholder logic intact so empty slices still produce a
no-op matrix entry.
- Around line 379-385: Validate that the inputs CHUNK_INDEX and CHUNK_COUNT are
non-negative integers before any numeric comparisons: check both against a regex
like ^[0-9]+$ and if either fails, emit an error (referencing
CHUNK_INDEX/CHUNK_COUNT) and exit 1; after that, enforce CHUNK_COUNT>=1
(hard-fail if not) and then verify CHUNK_INDEX < CHUNK_COUNT as currently done.
Use explicit error messages for invalid format vs out-of-range to aid debugging.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2c033d5f-34dd-4ac4-b5e1-c3c81ec60daa
📒 Files selected for processing (2)
.github/workflows/infrastructure-download-external.yml.github/workflows/infrastructure-repository-update.yml
workflow_call's `type: number` is enforced at the YAML boundary only; direct API or templated callers can send non-numeric strings that end up in the env. Bash's arithmetic context silently treats non-numeric as 0, so "abc" passed `-lt 1`, hit the silent reset to 1, and the chunk slice ran with a quietly-wrong CHUNK_COUNT. Add explicit guards before any numeric comparison: 1. Regex `^[0-9]+$` on both CHUNK_INDEX and CHUNK_COUNT — fail with "is not a non-negative integer" naming the bad field. 2. Hard-fail CHUNK_COUNT < 1 (was: silent reset to 1) — masking caller bugs is worse than failing loudly. 3. Existing CHUNK_INDEX >= CHUNK_COUNT range check unchanged. Each failure mode emits a distinct error message so a misconfig caller can tell format-error from range-error at a glance.
The {name:none,...} placeholder existed to keep strategy.matrix
non-empty when there's no work, with a comment promising the
downstream job would 'skip entries with name=none'. The skip was
never implemented — every step would try to source
os/external/${{ matrix.name }}.conf, which fails on
os/external/none.conf because no such file exists.
Job-level `if: matrix.name != 'none'` doesn't work either:
matrix.* isn't available at job-level if-evaluation (it's expanded
after).
Fix: thread a `has_work` boolean output from the start job and gate
the download job on it via a job-level `if:` (needs.* outputs ARE
available there). Set has_work='false' in both placeholder paths
(no matrix entries at all, OR this chunk's slice happens to be
empty). The placeholder still ships to keep strategy.matrix valid,
but the job is skipped before the matrix expands so no runner is
allocated and no source attempt is made.
Summary
infrastructure-download-external.yml'sdownloadjob hit GitHub Actions' 256-entrystrategy.matrixcap (259 observed). Rather than trimming legitimate arch × release combos or duplicating a 500-line job block, lean on the fact that the workflow is already a reusable workflow — call it N times from the parent with aCHUNK_INDEX/CHUNK_COUNTpair, and have the child filter its matrix to its slice.Child (
infrastructure-download-external.yml)CHUNK_INDEX(default 0) andCHUNK_COUNT(default 1).startjob slicesinclude[]using modular index — entries whereindex % CHUNK_COUNT == CHUNK_INDEX— so slow packages don't cluster into one chunk.assets-for-downloadartifact name gains-${CHUNK_INDEX}suffix so parallel uploads don't race.Parent (
infrastructure-repository-update.yml)external:call becomes astrategy.matrixoverchunk_index: [0, 1, 2, 3], passingCHUNK_COUNT: 4.Effect
strategy.matrixcap hitScale past 1024 by bumping
chunk_index: [0..N-1]andCHUNK_COUNT: Nin lockstep. No block duplication, no file extraction, no matrix trimming.Legacy / un-chunked callers that omit the new inputs get
CHUNK_COUNT=1and receive the entire matrix as before.Test plan
infrastructure-repository-update.ymlmanually; confirm 4externalchild runs appear, each with a subset of the matrixcleanstep tears down its ownassets-for-download-<N>artifact without interfering with siblingsCopying:job (whichneeds: external) still runs once, after all 4 chunks complete — GitHub Actions waits for all matrix legs of the reusable-workflow call by default