[26.0] Batch-prefetch HDCA job_state_summary in history contents listing by mvdbeek · Pull Request #22256 · galaxyproject/galaxy

mvdbeek · 2026-03-25T11:32:29Z

When listing history contents, each HDCA triggered an individual SQL query to aggregate job states during serialization. This adds a batch query that prefetches all job state summaries in a single query before serialization begins, injecting results into the HDCA instances' cache.

Before:
HDCA object loading: 1 query (batch IN)
Job state summaries: N queries (one per HDCA)
Total for HDCAs: 1 + N queries

After:
HDCA object loading: 1 query (batch IN)
Job state summaries: 1 query (batch IN + GROUP BY)
Total for HDCAs: 2 queries

Benchmark test results:

Old per-HDCA path emits at least N queries for N HDCAs
batch_fetch_job_state_summaries emits exactly 1 query regardless of HDCA count
Batch results are identical to per-HDCA results
Injecting batch results into _job_state_summary cache prevents any further queries
Wall-clock benchmark across 1/10/50/100 HDCAs confirms batch path uses fewer queries

Gist: https://gist.github.com/mvdbeek/6806938201eef20470285bc18267747d

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.
Instructions for manual testing are as follows:
1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

When listing history contents, each HDCA triggered an individual SQL query to aggregate job states during serialization. This adds a batch query that prefetches all job state summaries in a single query before serialization begins, injecting results into the HDCA instances' cache. Before: HDCA object loading: 1 query (batch IN) Job state summaries: N queries (one per HDCA) Total for HDCAs: 1 + N queries After: HDCA object loading: 1 query (batch IN) Job state summaries: 1 query (batch IN + GROUP BY) Total for HDCAs: 2 queries Benchmark test results: - Old per-HDCA path emits at least N queries for N HDCAs - batch_fetch_job_state_summaries emits exactly 1 query regardless of HDCA count - Batch results are identical to per-HDCA results - Injecting batch results into _job_state_summary cache prevents any further queries - Wall-clock benchmark across 1/10/50/100 HDCAs confirms batch path uses fewer queries Gist: https://gist.github.com/mvdbeek/6806938201eef20470285bc18267747d

jmchilton · 2026-03-26T03:52:50Z

Wall-clock benchmark across 1/10/50/100 HDCAs confirms batch path uses fewer queries

You're replacing code I understand with code I don't - which I think is normal for performance tweaks and given that I'm old and dense - but there is no evidence of a problem or that this fixes it. Like that sentence doesn't answer if it reduces contention or ups it, does it run faster or not, it just verifies fewer queries. I suspect it might improve things - but there are situations where locking over a longer database operation would actually make things worse I assume - I don't know if such locking even occurs in this query though.

Maybe this is just a salespitch issue... this feels like a stab in the dark and I would take a Marius "I think this might make a problem less bad - I want to deploy to production and test things" over a Claude "Here I did wallclock testing and found... suspiciously not a wall clock result but a query counting result".

I trust you implicitly to monitor the situation and fix it if things go wrong - so I'm going to approve either way obviously. I don't get my hands dirty with deployment stuff so I'm internally grateful that you do and that you do such an amazing job at it.

mvdbeek · 2026-03-26T08:43:39Z

but there is no evidence of a problem

fair, I didn't make this explicit in the PR description

it is among the most heavily used routes, and has the worst p95 response time among those, so that is something data-driven to optimize. You can see the overhead and N+1 pattern quite nicely in https://sentry.galaxyproject.org/organizations/galaxy/insights/backend/summary/trace/d3d0b63c60fe4e068d1a3c1376aa6af6/?node=txn-1d57c06219c549ac8aa8aa9b98c1bb25&project=3&query=http.method%3AGET%20transaction.op%3Ahttp.server&referrer=performance-transaction-summary&source=performance_transaction_summary&statsPeriod=12h&timestamp=1774474456&transaction=%2Fapi%2Fhistories%2F%7Bhistory_id%7D%2Fcontents&unselectedSeries=p100%28%29&unselectedSeries=avg%28%29:

or that this fixes it

The gist has a unit test that prints some data points that I think I can believe (but production is of course always a different story, I concede that):

test/unit/data/model/test_batch_job_state_summary.py::test_timing_comparison[1]
  HDCAs=  1  |  Old:   1 queries 0.0011s  |  Batch:   1 queries 0.0009s  |  Speedup: 1.2x
PASSED
test/unit/data/model/test_batch_job_state_summary.py::test_timing_comparison[10]
  HDCAs= 10  |  Old:  10 queries 0.0076s  |  Batch:   1 queries 0.0013s  |  Speedup: 5.8x
PASSED
test/unit/data/model/test_batch_job_state_summary.py::test_timing_comparison[50]
  HDCAs= 50  |  Old:  50 queries 0.0369s  |  Batch:   1 queries 0.0022s  |  Speedup: 16.7x
PASSED
test/unit/data/model/test_batch_job_state_summary.py::test_timing_comparison[100]
  HDCAs=100  |  Old: 100 queries 0.0754s  |  Batch:   1 queries 0.0038s  |  Speedup: 19.7x
PASSED

mvdbeek · 2026-03-26T08:49:03Z

I'll deploy and report back

mvdbeek · 2026-03-27T10:46:29Z

It isn't the silver bullet, since there are remaining N+1 queries in this endpoint, but it seems to have dropped the P95 response time from 1.2 seconds to 950ms in comparable 12 hour windows (my morning, no load peaks). The generated SQL is fairly readable imo (compared to the caching queries for instance)

mvdbeek added kind/bug area/API area/performance labels Mar 25, 2026

github-project-automation Bot added this to Galaxy Dev - weeklies Mar 25, 2026

mvdbeek added the area/histories label Mar 25, 2026

github-project-automation Bot moved this to Needs Review in Galaxy Dev - weeklies Mar 25, 2026

mvdbeek force-pushed the fix-n-plus-one-history-contents branch from 204e7c4 to 929deb9 Compare March 25, 2026 11:51

github-actions Bot added this to the 26.1 milestone Mar 25, 2026

github-actions Bot changed the title ~~Batch-prefetch HDCA job_state_summary in history contents listing~~ [26.0] Batch-prefetch HDCA job_state_summary in history contents listing Mar 25, 2026

jmchilton approved these changes Mar 26, 2026

View reviewed changes

mvdbeek merged commit 0cc0b06 into galaxyproject:release_26.0 Mar 27, 2026
56 of 59 checks passed

github-project-automation Bot moved this from Needs Review to Done in Galaxy Dev - weeklies Mar 27, 2026

ahmedhamidawan modified the milestones: 26.1, 26.0 Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[26.0] Batch-prefetch HDCA job_state_summary in history contents listing#22256

[26.0] Batch-prefetch HDCA job_state_summary in history contents listing#22256
mvdbeek merged 1 commit intogalaxyproject:release_26.0from
mvdbeek:fix-n-plus-one-history-contents

mvdbeek commented Mar 25, 2026

Uh oh!

jmchilton commented Mar 26, 2026

Uh oh!

mvdbeek commented Mar 26, 2026

Uh oh!

mvdbeek commented Mar 26, 2026

Uh oh!

mvdbeek commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mvdbeek commented Mar 25, 2026

How to test the changes?

License

Uh oh!

jmchilton commented Mar 26, 2026

Uh oh!

mvdbeek commented Mar 26, 2026

Uh oh!

mvdbeek commented Mar 26, 2026

Uh oh!

mvdbeek commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants