Skip to content

Fix addEmptyBuckets from creating too many buckets when given big extended bounds #17718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

harshavamsi
Copy link
Contributor

@harshavamsi harshavamsi commented Mar 27, 2025

Description

Most of the description is in #17702, this PR adds checks before we can create empty buckets.

Before we create empty buckets, we check how many potential buckets would be created and add those to the CircuitBreaker which could either trip or cause max_buckets_exception.

Related Issues

Resolves #17702

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 60c7b21: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@harshavamsi harshavamsi changed the title Initial commit to address reduce empty buckets bug Fix addEmptyBuckets from creating too many buckets when given big extended bounds Mar 27, 2025
@github-actions github-actions bot added the bug Something isn't working label Mar 27, 2025
@harshavamsi harshavamsi marked this pull request as ready for review March 27, 2025 23:06
@bowenlan-amzn bowenlan-amzn self-assigned this May 7, 2025
@bowenlan-amzn bowenlan-amzn moved this from In Progress to In-Review in Performance Roadmap May 7, 2025
@bowenlan-amzn bowenlan-amzn moved this from In-Review to In Progress in Performance Roadmap May 7, 2025
@harshavamsi harshavamsi closed this May 7, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in Performance Roadmap May 7, 2025
@harshavamsi harshavamsi reopened this May 7, 2025
@github-project-automation github-project-automation bot moved this from Done to In Progress in Performance Roadmap May 7, 2025
Comment on lines 413 to 425
int preEmptyBucketCount = list.size();
// we use counts here only to add those values to the CircuitBreaker, list's count has already been added in #reduce, so we only
// need to add emptyBucketCount
int emptyBucketCount = getTotalBucketCount() - list.size();
if (emptyBucketCount > 0) {
CircuitBreaker breaker = reduceContext.getBreaker();
if (breaker != null) {
breaker.addEstimateBytesAndMaybeBreak(50L * emptyBucketCount, "empty histogram buckets");
}
preEmptyBucketCount += emptyBucketCount;
reduceContext.consumeBucketsAndMaybeBreak(emptyBucketCount);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int preEmptyBucketCount = list.size();
// we use counts here only to add those values to the CircuitBreaker, list's count has already been added in #reduce, so we only
// need to add emptyBucketCount
int emptyBucketCount = getTotalBucketCount() - list.size();
if (emptyBucketCount > 0) {
CircuitBreaker breaker = reduceContext.getBreaker();
if (breaker != null) {
breaker.addEstimateBytesAndMaybeBreak(50L * emptyBucketCount, "empty histogram buckets");
}
preEmptyBucketCount += emptyBucketCount;
reduceContext.consumeBucketsAndMaybeBreak(emptyBucketCount);
}
final int originalSize = list.size();
final int estimateEmptyBucketCount = estimateTotalBucketCount() - originalSize;
assert estimateEmptyBucketCount >= 0;
CircuitBreaker breaker = reduceContext.getBreaker();
if (breaker != null) {
// 50 bytes memory usage for each empty bucket
breaker.addEstimateBytesAndMaybeBreak(50L * estimateEmptyBucketCount, "empty histogram buckets");
}
reduceContext.consumeBucketsAndMaybeBreak(estimateEmptyBucketCount);

I think If emptyBucketCount < 0, that means the estimateTotalBucketCount is wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The breaker.addEstimateBytesAndMaybeBreak can be fold into reduceContext, so you don't need to do the null check here.

Comment on lines 469 to 470
int postEmptyBucketCount = list.size() - preEmptyBucketCount;
reduceContext.consumeBucketsAndMaybeBreak(postEmptyBucketCount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int postEmptyBucketCount = list.size() - preEmptyBucketCount;
reduceContext.consumeBucketsAndMaybeBreak(postEmptyBucketCount);
int postAddEmptyBucketCount = list.size() - estimateEmptyBucketCount - originalSize;
reduceContext.consumeBucketsAndMaybeBreak(postAddEmptyBucketCount);

Comment on lines +392 to +404
int i = 0;
double key = min;
while (key < max && i++ < 10) {
bucketCount++;
key = nextKey(key);
}

if (bucketCount < 10) {
return bucketCount;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the goal of this?

@@ -382,7 +383,46 @@ private double round(double key) {
return Math.floor((key - emptyBucketInfo.offset) / emptyBucketInfo.interval) * emptyBucketInfo.interval + emptyBucketInfo.offset;
}

private void addEmptyBuckets(List<Bucket> list, ReduceContext reduceContext) {
private int getTotalBucketCount() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private int getTotalBucketCount() {
private int estimateTotalBucketCount() {

Copy link
Contributor

github-actions bot commented May 7, 2025

❌ Gradle check result for ed6093c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@bowenlan-amzn
Copy link
Member

bowenlan-amzn commented May 7, 2025

@harshavamsi This is a simple case that still cause java.lang.OutOfMemoryError: Java heap space

curl -X PUT "http://localhost:9200/test-index" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "value": { "type": "double" },
      "timestamp": { "type": "date" }
    }
  }
}'
curl -X POST "http://localhost:9200/test-index/_doc/1" -H 'Content-Type: application/json' -d'
{
  "value": 1,
  "timestamp": "2000-01-01T00:00:00Z"
}'
curl -X POST "http://localhost:9200/test-index/_doc/2" -H 'Content-Type: application/json' -d'
{
  "value": 1000000000,
  "timestamp": "2025-01-01T00:00:00Z"
}'

curl -X POST "http://localhost:9200/test-index/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "value_hist": {
      "histogram": {
        "field": "value",
        "interval": 1
      }
    }
  }
}'

curl -X POST "http://localhost:9200/test-index/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "timestamp_hist": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "1m"
      }
    }
  }
}'

@harshavamsi
Copy link
Contributor Author

@harshavamsi This is a simple case that still cause java.lang.OutOfMemoryError: Java heap space

curl -X PUT "http://localhost:9200/test-index" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "value": { "type": "double" },
      "timestamp": { "type": "date" }
    }
  }
}'
curl -X POST "http://localhost:9200/test-index/_doc/1" -H 'Content-Type: application/json' -d'
{
  "value": 1,
  "timestamp": "2000-01-01T00:00:00Z"
}'
curl -X POST "http://localhost:9200/test-index/_doc/2" -H 'Content-Type: application/json' -d'
{
  "value": 1000000000,
  "timestamp": "2025-01-01T00:00:00Z"
}'

curl -X POST "http://localhost:9200/test-index/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "value_hist": {
      "histogram": {
        "field": "value",
        "interval": 1
      }
    }
  }
}'

curl -X POST "http://localhost:9200/test-index/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "timestamp_hist": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "1m"
      }
    }
  }
}'

thanks for bringing this up, I will include this in the fix!

Signed-off-by: Harsha Vamsi Kalluri <[email protected]>
Signed-off-by: Harsha Vamsi Kalluri <[email protected]>
Signed-off-by: Harsha Vamsi Kalluri <[email protected]>
Signed-off-by: Harsha Vamsi Kalluri <[email protected]>
Signed-off-by: Harsha Vamsi Kalluri <[email protected]>
Signed-off-by: Harsha Vamsi Kalluri <[email protected]>
Signed-off-by: Harsha Vamsi Kalluri <[email protected]>
Signed-off-by: Harsha Vamsi Kalluri <[email protected]>
Copy link
Contributor

❌ Gradle check result for 67e4508: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-project-automation github-project-automation bot moved this from In Progress to Done in Performance Roadmap May 28, 2025
@harshavamsi harshavamsi reopened this May 28, 2025
@github-project-automation github-project-automation bot moved this from Done to In Progress in Performance Roadmap May 28, 2025
Copy link
Contributor

❌ Gradle check result for 67e4508: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Jun 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.3 Backport to 1.3 branch backport 2.x Backport to 2.x branch backport 3.0 bug Something isn't working stalled Issues that have stalled v3.1.0
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

[BUG] Histogram aggregations can produce billions of empty buckets consuming lots of memory causing OOM issues
5 participants