Skip to content

[BUG] Sometimes aggregations are empty with terminate_after #13288

Closed
@Rmaan

Description

@Rmaan

Describe the bug

We found weird bugs in our search faceting after moving to OpenSearch 2.11 from Elasticsearch, it seems when terminate_after is passed, sometimes returned buckets are fully empty (Even though all processed docs should have a bucket) and sometimes it's way less than the terminate_after * primary_shard_count, although search is terminated early and all items have a value for the aggregation.

We couldn't reproduce this issues with OpenSearch 2.9 but 2.10 was affected.

Related component

Search:Aggregations

To Reproduce

Exact reproduction is hard, seems we need to have a couple segments to see the problem, and reported issues are when we aggregate on a keyword field while filtering based on some integer field. When we aggregate on the same integer column the issue doesn't happen.

Sample request:

{
  "track_total_hits": true,
  "_source": ["materials.facet.en.lvl0"],
  "aggregations": {
    "materials_facet": {
      "terms": {
        "field": "materials.facet.en.lvl0"
      }
    }
  },
  "query": {
    "term": {
      "materials.ids": 1
    }
  },
  "size": 100,
  "terminate_after": 1
}

Sample response:

{
  "took": 11,
  "timed_out": false,
  "terminated_early": true,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "products_01",
        "_id": "63033976",
        "_score": 1,
        "_source": {
          "materials": {
            "facet": {
              "en": {
                "lvl0": "Cashmere#1"
              }
            }
          }
        }
      },
      {
        "_index": "products_01",
        "_id": "31224269",
        "_score": 1,
        "_source": {
          "materials": {
            "facet": {
              "en": {
                "lvl0": "Cashmere#1"
              }
            }
          }
        }
      },
      {
        "_index": "products_01",
        "_id": "63080864",
        "_score": 1,
        "_source": {
          "materials": {
            "facet": {
              "en": {
                "lvl0": "Cashmere#1"
              }
            }
          }
        }
      }
    ]
  },
  "aggregations": {
    "materials_facet": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": []
    }
  }
}

Expected behavior

We should see a bucket in aggregations.materials_facet.buckets.

As you can see we have terminate_after=1 means each shard should at least process 1 document, we have 3 shards so in total 3 docs should be processed. This can be verified in hits.total.value and in hits array. But as you can see aggregations doesn't match with the documents that you can see in hits.

The issue will go away if we remove terminate_after but that will hurt performance because we have a high number of documents. Terminating after a 100K items is enough for us.

Additional Details

Host/Environment:

  • Version 2.11
  • AWS managed OpenSearch

Metadata

Metadata

Assignees

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions