Skip to content

Using competitive iterators in Filters agg is broken #126955

@benwtrent

Description

@benwtrent
Member

Elasticsearch Version

9.1.0

Installed Plugins

No response

Java Version

bundled

OS Version

any

Problem Description

When using a query & a filters aggregation, we may use a competitive iterator, with the Lucene 10.2 upgrade, this is now broken. This results in an indeterminable number of counts. The exact reasoning is still unknown

Steps to Reproduce

This has been replicated in a unit test and will be committed shortly.

Generally, the idea is you have terms that you are filtering on and doing a filters agg while also doing a term query.

Logs (if relevant)

No response

Activity

elasticsearchmachine

elasticsearchmachine commented on Apr 16, 2025

@elasticsearchmachine
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine

elasticsearchmachine commented on Apr 16, 2025

@elasticsearchmachine
Collaborator

Pinging @elastic/es-search (Team:Search)

benwtrent

benwtrent commented on Apr 16, 2025

@benwtrent
MemberAuthor

I THINK it has to do with the "intoBitSet" stuff. If it does regular iteration, everything works well, Once it does intoBitSet the numbers get all "weird" and the iterators skip huge sections of validly matching docs.

benwtrent

benwtrent commented on Apr 16, 2025

@benwtrent
MemberAuthor

OK, I think I see the issue, I am not sure how to fix.

The DenseConjunctionBulkScorer#scoreWindowUsingBitSet utilizes the competitive iterator we return. It will drain the iterator into the bit set

    for(upTo = 1; upTo < iterators.size() && this.windowMatches.cardinality() >= threshold; ++upTo) {
      DocIdSetIterator other = (DocIdSetIterator)iterators.get(upTo);
      if (other.docID() < windowBase) {
        other.advance(windowBase);
      }

      other.intoBitSet(windowMax, this.clauseWindowMatches, windowBase);
      this.windowMatches.and(this.clauseWindowMatches);
      this.clauseWindowMatches.clear();
    }

, doing an and and then calling

collector.collect(new BitSetDocIdStream(this.windowMatches, windowBase));

So, the docId stream we then get in the leaf collector is the set of docIds where the iterators overlap. However, we need to know which doc actually matches which doc (e.g. topList). So, I don't know how to handle that.

It seems we need to handle this bitset iteration correctly (or bypass it somehow).

Class in question: https://github.com/apache/lucene/blob/branch_10_2/lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java

ChrisHegarty

ChrisHegarty commented on Apr 17, 2025

@ChrisHegarty
Contributor

With the way that DenseConjunctionBulkScorer now eagerly drains chunks of iterators into a bitSet, we can no longer observe the advancement of the iterators as we collect. The observable side-effect of the advancement of the iterators is not really something that was ever guaranteed, but clearly worked prior to Lucene 10.2. I don't see any obvious or clear way to restructure things that will allow us to use a competitive iterator here.

benwtrent

benwtrent commented on Apr 17, 2025

@benwtrent
MemberAuthor
jdcryans

jdcryans commented on Apr 17, 2025

@jdcryans

FYSA we also have #126939 tracking the stack trace that you shared in that lucene issue, @benwtrent

7 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Analytics/AggregationsAggregations:Search/SearchSearch-related issues that do not fall into other categories>bugTeam:AnalyticsMeta label for analytical engine team (ESQL/Aggs/Geo)Team:SearchMeta label for search team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @jdcryans@jpountz@benwtrent@elasticsearchmachine@ChrisHegarty

        Issue actions

          Using competitive iterators in Filters agg is broken · Issue #126955 · elastic/elasticsearch