-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Open
Labels
:Analytics/AggregationsAggregationsAggregations:Search/SearchSearch-related issues that do not fall into other categoriesSearch-related issues that do not fall into other categories>bugTeam:AnalyticsMeta label for analytical engine team (ESQL/Aggs/Geo)Meta label for analytical engine team (ESQL/Aggs/Geo)Team:SearchMeta label for search teamMeta label for search team
Description
Elasticsearch Version
9.1.0
Installed Plugins
No response
Java Version
bundled
OS Version
any
Problem Description
When using a query & a filters aggregation, we may use a competitive iterator, with the Lucene 10.2 upgrade, this is now broken. This results in an indeterminable number of counts. The exact reasoning is still unknown
Steps to Reproduce
This has been replicated in a unit test and will be committed shortly.
Generally, the idea is you have terms that you are filtering on and doing a filters agg while also doing a term query.
Logs (if relevant)
No response
Metadata
Metadata
Assignees
Labels
:Analytics/AggregationsAggregationsAggregations:Search/SearchSearch-related issues that do not fall into other categoriesSearch-related issues that do not fall into other categories>bugTeam:AnalyticsMeta label for analytical engine team (ESQL/Aggs/Geo)Meta label for analytical engine team (ESQL/Aggs/Geo)Team:SearchMeta label for search teamMeta label for search team
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
elasticsearchmachine commentedon Apr 16, 2025
Pinging @elastic/es-analytical-engine (Team:Analytics)
elasticsearchmachine commentedon Apr 16, 2025
Pinging @elastic/es-search (Team:Search)
benwtrent commentedon Apr 16, 2025
I THINK it has to do with the "intoBitSet" stuff. If it does regular iteration, everything works well, Once it does
intoBitSet
the numbers get all "weird" and the iterators skip huge sections of validly matching docs.benwtrent commentedon Apr 16, 2025
OK, I think I see the issue, I am not sure how to fix.
The
DenseConjunctionBulkScorer#scoreWindowUsingBitSet
utilizes the competitive iterator we return. It will drain the iterator into the bit set, doing an
and
and then callingcollector.collect(new BitSetDocIdStream(this.windowMatches, windowBase));
So, the docId stream we then get in the leaf collector is the set of docIds where the iterators overlap. However, we need to know which doc actually matches which doc (e.g. topList). So, I don't know how to handle that.
It seems we need to handle this bitset iteration correctly (or bypass it somehow).
Class in question: https://github.com/apache/lucene/blob/branch_10_2/lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java
ChrisHegarty commentedon Apr 17, 2025
With the way that
DenseConjunctionBulkScorer
now eagerly drains chunks of iterators into a bitSet, we can no longer observe the advancement of the iterators as we collect. The observable side-effect of the advancement of the iterators is not really something that was ever guaranteed, but clearly worked prior to Lucene 10.2. I don't see any obvious or clear way to restructure things that will allow us to use a competitive iterator here.benwtrent commentedon Apr 17, 2025
related: apache/lucene#14517
jdcryans commentedon Apr 17, 2025
FYSA we also have #126939 tracking the stack trace that you shared in that lucene issue, @benwtrent
7 remaining items