Description
Describe the bug
We found weird bugs in our search faceting after moving to OpenSearch 2.11 from Elasticsearch, it seems when terminate_after
is passed, sometimes returned buckets are fully empty (Even though all processed docs should have a bucket) and sometimes it's way less than the terminate_after * primary_shard_count
, although search is terminated early and all items have a value for the aggregation.
We couldn't reproduce this issues with OpenSearch 2.9 but 2.10 was affected.
Related component
Search:Aggregations
To Reproduce
Exact reproduction is hard, seems we need to have a couple segments to see the problem, and reported issues are when we aggregate on a keyword field while filtering based on some integer field. When we aggregate on the same integer column the issue doesn't happen.
Sample request:
{
"track_total_hits": true,
"_source": ["materials.facet.en.lvl0"],
"aggregations": {
"materials_facet": {
"terms": {
"field": "materials.facet.en.lvl0"
}
}
},
"query": {
"term": {
"materials.ids": 1
}
},
"size": 100,
"terminate_after": 1
}
Sample response:
{
"took": 11,
"timed_out": false,
"terminated_early": true,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "products_01",
"_id": "63033976",
"_score": 1,
"_source": {
"materials": {
"facet": {
"en": {
"lvl0": "Cashmere#1"
}
}
}
}
},
{
"_index": "products_01",
"_id": "31224269",
"_score": 1,
"_source": {
"materials": {
"facet": {
"en": {
"lvl0": "Cashmere#1"
}
}
}
}
},
{
"_index": "products_01",
"_id": "63080864",
"_score": 1,
"_source": {
"materials": {
"facet": {
"en": {
"lvl0": "Cashmere#1"
}
}
}
}
}
]
},
"aggregations": {
"materials_facet": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
}
Expected behavior
We should see a bucket in aggregations.materials_facet.buckets
.
As you can see we have terminate_after=1
means each shard should at least process 1 document, we have 3 shards so in total 3 docs should be processed. This can be verified in hits.total.value
and in hits
array. But as you can see aggregations doesn't match with the documents that you can see in hits.
The issue will go away if we remove terminate_after
but that will hurt performance because we have a high number of documents. Terminating after a 100K items is enough for us.
Additional Details
Host/Environment:
- Version 2.11
- AWS managed OpenSearch
Metadata
Metadata
Assignees
Labels
Type
Projects
Status