[26.0] Batch-prefetch HDCA job_state_summary in history contents listing#22256
Conversation
When listing history contents, each HDCA triggered an individual SQL query to aggregate job states during serialization. This adds a batch query that prefetches all job state summaries in a single query before serialization begins, injecting results into the HDCA instances' cache. Before: HDCA object loading: 1 query (batch IN) Job state summaries: N queries (one per HDCA) Total for HDCAs: 1 + N queries After: HDCA object loading: 1 query (batch IN) Job state summaries: 1 query (batch IN + GROUP BY) Total for HDCAs: 2 queries Benchmark test results: - Old per-HDCA path emits at least N queries for N HDCAs - batch_fetch_job_state_summaries emits exactly 1 query regardless of HDCA count - Batch results are identical to per-HDCA results - Injecting batch results into _job_state_summary cache prevents any further queries - Wall-clock benchmark across 1/10/50/100 HDCAs confirms batch path uses fewer queries Gist: https://gist.github.com/mvdbeek/6806938201eef20470285bc18267747d
204e7c4 to
929deb9
Compare
You're replacing code I understand with code I don't - which I think is normal for performance tweaks and given that I'm old and dense - but there is no evidence of a problem or that this fixes it. Like that sentence doesn't answer if it reduces contention or ups it, does it run faster or not, it just verifies fewer queries. I suspect it might improve things - but there are situations where locking over a longer database operation would actually make things worse I assume - I don't know if such locking even occurs in this query though. Maybe this is just a salespitch issue... this feels like a stab in the dark and I would take a Marius "I think this might make a problem less bad - I want to deploy to production and test things" over a Claude "Here I did wallclock testing and found... suspiciously not a wall clock result but a query counting result". I trust you implicitly to monitor the situation and fix it if things go wrong - so I'm going to approve either way obviously. I don't get my hands dirty with deployment stuff so I'm internally grateful that you do and that you do such an amazing job at it. |
fair, I didn't make this explicit in the PR description
it is among the most heavily used routes, and has the worst p95 response time among those, so that is something data-driven to optimize. You can see the overhead and N+1 pattern quite nicely in https://sentry.galaxyproject.org/organizations/galaxy/insights/backend/summary/trace/d3d0b63c60fe4e068d1a3c1376aa6af6/?node=txn-1d57c06219c549ac8aa8aa9b98c1bb25&project=3&query=http.method%3AGET%20transaction.op%3Ahttp.server&referrer=performance-transaction-summary&source=performance_transaction_summary&statsPeriod=12h×tamp=1774474456&transaction=%2Fapi%2Fhistories%2F%7Bhistory_id%7D%2Fcontents&unselectedSeries=p100%28%29&unselectedSeries=avg%28%29:
The gist has a unit test that prints some data points that I think I can believe (but production is of course always a different story, I concede that): |
|
I'll deploy and report back |
|
It isn't the silver bullet, since there are remaining N+1 queries in this endpoint, but it seems to have dropped the P95 response time from 1.2 seconds to 950ms in comparable 12 hour windows (my morning, no load peaks). The generated SQL is fairly readable imo (compared to the caching queries for instance) |


When listing history contents, each HDCA triggered an individual SQL query to aggregate job states during serialization. This adds a batch query that prefetches all job state summaries in a single query before serialization begins, injecting results into the HDCA instances' cache.
Before:
HDCA object loading: 1 query (batch IN)
Job state summaries: N queries (one per HDCA)
Total for HDCAs: 1 + N queries
After:
HDCA object loading: 1 query (batch IN)
Job state summaries: 1 query (batch IN + GROUP BY)
Total for HDCAs: 2 queries
Benchmark test results:
Gist: https://gist.github.com/mvdbeek/6806938201eef20470285bc18267747d
How to test the changes?
(Select all options that apply)
License