perf(api): speed up finding-groups /resources endpoint#10816
Merged
AdriiiPRodri merged 2 commits intoApr 21, 2026
Conversation
Two query-plan fixes in the FindingGroup viewset's /resources action: 1. Drop the has_mappings.exists() pre-check in _paginated_resource_response. That probe ran on every non-filtered request and re-did the same semi-join the paginator already runs; use the paginator's total count to detect orphan-only (IaC) groups instead. 2. Materialize finding_ids into a Python list before filtering ResourceFindingMapping. finding_id__in=Subquery(findings_qs) made the planner pick a Merge Semi Join that scanned hundreds of thousands of RFM index entries to post-filter tenant_id. An inlined UUID list flips the plan to Nested Loop + Index Scan on finding_id (sub-millisecond on real DB time). A request-scoped cache in _resolve_finding_ids deduplicates the materialization across helpers that build different RFM querysets from the same filtered findings set, so we do the findings round-trip once per request. On the mapping path this cuts ~1-2s off the request in the measured dev environment while keeping behavior identical for IaC/orphan groups.
Contributor
|
✅ All necessary |
Contributor
|
✅ Conflict Markers Resolved All conflict markers have been successfully resolved in this pull request. |
76d15b3 to
0b8bb98
Compare
Contributor
🔒 Container Security ScanImage: 📊 Vulnerability Summary
4 package(s) affected
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #10816 +/- ##
=======================================
Coverage 93.47% 93.47%
=======================================
Files 228 228
Lines 32027 32064 +37
=======================================
+ Hits 29936 29973 +37
Misses 2091 2091
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Collaborator
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation and see the Github Action logs for details |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Users reported that the finding groups page felt extremely slow, especially when drilling into the
/finding-groups/{check_id}/resourcesand/finding-groups/latest/{check_id}/resourcesendpoints. Profiling against a realistic dev dataset showed the request spending ~3.5 s in SQL for a response that ultimately returns ~10 rows.Description
Profiling with
django-silk+EXPLAIN ANALYZEtraced the cost to two independent problems, both insideFindingGroupViewSeton the/resourcesaction:Redundant existence probe on every request.
_paginated_resource_responseranhas_mappings = self._build_resource_mapping_queryset(...).exists()before handing control to_mapping_paginated_response, which then re-ran essentially the same semi-join as part of its paginatorCOUNT. For the 99% of requests that hit groups with resources (non-IaC), that extra probe was pure waste. Theexists()over partitionedResourceFindingMappingcost ~586 ms of real DB time per request on the test dataset.Planner picked a Merge Semi Join we didn't want.
_build_resource_mapping_querysetfiltered RFM withfinding_id__in=Subquery(findings_qs). PostgreSQL kept choosing a Merge Semi Join: it read thefinding_idindex of each RFM partition ordered byfinding_idand filteredtenant_idas a post-step. On the monthly partition currently being appended to, that meant reading ~470k index entries to keep ~38k matching the tenant, only to surface the first row of the semi-join result. Classic case of the planner treating the post-filter as free when it wasn't.This PR avoids both problems without schema changes:
has_mappings.exists()pre-check. Instead,_paginated_resource_responsenow calls_mapping_paginated_responsestraight away and uses the paginator's total count to decide when to fall back to_combined_paginated_responsefor pure-orphan/IaC groups. One less query per request on the common path; behavior for IaC groups is unchanged.finding_id__in=[uuid, uuid, …]) rather thanSubquery(...). With an inlined value list the planner switches toNested Loop + Index Scanon the RFMfinding_idindex, keyed directly by the constant list. RFM aggregation queries now execute in under 1 ms of DB time._resolve_finding_ids) so the three helpers that build different RFM querysets from the same filtered findings set (_build_resource_mapping_queryset,_build_resource_ordering_queryset,_build_resource_aggregation) reuse the same materialized list instead of re-running the findings query 3–4× per request.EXPLAIN ANALYZE (real captures)
Before:
has_mappings.exists()pre-check (~586 ms real DB time)Reading 470k index entries just to post-filter
tenant_idand keep 38k, only to return a single row viaLIMIT 1.Before: main RFM aggregation (Merge Semi Join, ~21 ms real DB time but unstable under load)
After: same aggregation, Nested Loop + Index Scan driven by the materialized list (< 1 ms real DB time)
The semi-join is gone; the RFM access is a targeted
Index Cond: finding_id = ANY (…)driven straight by the inlined UUID list, which is exactly what we want on partitioned RFM.Notes on the per-request cache
The cache in
_resolve_finding_idslives onself(theFindingGroupViewSetinstance). DRF creates a freshViewSetinstance for every request (as_view()callscls(**initkwargs)per dispatch), so the cache is effectively request-scoped. Reviewers have asked a few times whether this could cause the endpoint to serve stale data; the short answer is no:self._finding_ids_cache = Noneand re-materializes the finding IDs against the current DB state. There is no cross-request caching._build_resource_mapping_queryset,_build_resource_ordering_queryset,_build_resource_aggregation) to operate on the same snapshot of finding IDs. This is actually more consistent than the previous behavior, where each helper re-ran the findings subquery and could each observe slightly different state under PostgreSQL's default read-committed isolation.ResourceFindingMappingrows for already-cached finding IDs are still visible, because each RFM query runs live; only theIN (...)list is frozen.ison thefiltered_querysetobject, not by equality. If a caller were to ever pass a different queryset instance (a clone), the cache would miss and we would re-materialize, so there is no risk of serving data from a different filter set.Comparison
Measured against the same tenant in the dev environment. Numbers are silk-reported totals (aws RDS in
eu-west-1accessed from a local docker container, so each round-trip inflates query timings — DB time itself is sub-millisecond after the fix)./finding-groups/latest/s3_bucket_public_access/resources(group with mappings, common path)exists()pre-check/finding-groups/latest/aws-autoscaling-enable-at-rest-encryption/resources(IaC-like group with no mappings, orphan fallback path)The orphan path is essentially at par with the baseline; the mapping path (which is the common case) sees the SQL plan collapse from 586 ms per RFM touch to sub-millisecond, and shaves ~1.3–2 s off the request against the remote RDS setup used for benchmarking. In production where the API is co-located with the database, the remaining silk-reported latency (which is dominated by network round-trip time) will collapse as well.
Steps to review
FindingGroupViewSetinapi/src/backend/api/v1/views.py:_resolve_finding_idshelper and its request-scoped cache._build_resource_mapping_querysetnow uses the Python list instead ofSubquery._paginated_resource_responsedrops thehas_mappings.exists()branch and piggybacks on the paginator count to detect orphan-only groups.page[number],page[size]) and sort (sort=status,severity,delta,...) still work.filter[resource_uid],filter[resource_type], etc.) keep returning the same resources.EXPLAIN ANALYZEonResourceFindingMappingqueries to confirm the planner now usesIndex Scan using …_finding_id_idxwithIndex Cond: (finding_id = ANY (...))instead of the previous Merge Semi Join.Checklist
Community Checklist
API
License
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.