-
Notifications
You must be signed in to change notification settings - Fork 578
Star Tree Search changes related to new Aggregations supported #9163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 18 commits
318915e
1a02f55
5280adb
4fbe03c
4a9d099
1cf92e2
01ba392
95fea9b
7a6a90c
ac157a1
f97a7d1
d393211
274c8a4
365aed0
1c359fd
3323395
f6eae46
6777e85
1be0ba4
531e498
a12edfc
655fce5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,7 +37,8 @@ PUT logs | |
"settings": { | ||
"index.number_of_shards": 1, | ||
"index.number_of_replicas": 0, | ||
"index.composite_index": true | ||
"index.composite_index": true, | ||
"index.append_only.enabled": true | ||
}, | ||
"mappings": { | ||
"composite": { | ||
|
@@ -54,6 +55,16 @@ PUT logs | |
}, | ||
{ | ||
"name": "port" | ||
}, | ||
{ | ||
"name": "method" | ||
}, | ||
{ | ||
"name": "@timestamp", | ||
"calendar_intervals": [ | ||
"month", | ||
"day" | ||
] | ||
} | ||
], | ||
"metrics": [ | ||
|
@@ -80,6 +91,10 @@ PUT logs | |
} | ||
}, | ||
"properties": { | ||
"@timestamp": { | ||
"format": "strict_date_optional_time||epoch_second", | ||
"type": "date" | ||
}, | ||
"status": { | ||
"type": "integer" | ||
}, | ||
|
@@ -89,6 +104,9 @@ PUT logs | |
"request_size": { | ||
"type": "integer" | ||
}, | ||
"method" : { | ||
"type": "keyword" | ||
}, | ||
"latency": { | ||
"type": "scaled_float", | ||
"scaling_factor": 10 | ||
|
@@ -118,17 +136,33 @@ When using the `ordered_dimesions` parameter, follow these best practices: | |
|
||
- The order of dimensions matters. You can define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning. | ||
- Avoid using high-cardinality fields as dimensions. High-cardinality fields adversely affect storage space, indexing throughput, and query performance. | ||
- Currently, fields supported by the `ordered_dimensions` parameter are all [numeric field types]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/numeric/), with the exception of `unsigned_long`. For more information, see [GitHub issue #15231](https://github.com/opensearch-project/OpenSearch/issues/15231). | ||
- Support for other field types, such as `keyword` and `ip`, will be added in future versions. For more information, see [GitHub issue #16232](https://github.com/opensearch-project/OpenSearch/issues/16232). | ||
- A minimum of `2` and a maximum of `10` dimensions are supported per star-tree index. | ||
|
||
The `ordered_dimensions` parameter supports the following field types: | ||
|
||
- All numeric field types excluding `unsigned_long` and `scaled_float`. | ||
- `keyword` | ||
- `object` | ||
- `date` which can use up to three of following calendar intervals: | ||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- `year` (of era) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do we mean by "of blank"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The fields names are what defined inside " |
||
- `quarter` (of year) | ||
- `month` (of year) | ||
- `week` (of week based year) | ||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- `day` (of month) | ||
- `hour` (of day) | ||
- `half-hour` (of day) | ||
- `quater-hour` (of day) | ||
- `minute` (of hour) | ||
- `second` (of minute) | ||
|
||
Support for other field types, such as `ip`, will be added in future versions. For more information, see [GitHub issue #13875](https://github.com/opensearch-project/OpenSearch/issues/13875). | ||
|
||
The `ordered_dimensions` parameter supports the following property. | ||
|
||
| Parameter | Required/Optional | Description | | ||
| :--- | :--- | :--- | | ||
| `name` | Required | The name of the field. The field name should be present in the `properties` section as part of the index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields. | | ||
|
||
|
||
### Metrics | ||
|
||
Configure any metric fields on which you need to perform aggregations. `Metrics` are required as part of a star-tree index configuration. | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -26,7 +26,7 @@ A star-tree index can be used to perform faster aggregations. Consider the follo | |||||
|
||||||
Star-tree indexes have the following limitations: | ||||||
|
||||||
- A star-tree index should only be enabled on indexes whose data is not updated or deleted because updates and deletions are not accounted for in a star-tree index. | ||||||
- A star-tree index should only be enabled on indexes whose data is not updated or deleted because updates and deletions are not accounted for in a star-tree index. To enforce this policy and use star-tree indexes, set the `index.append_only.enabled` setting to true. | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
- A star-tree index can be used for aggregation queries only if the queried fields are a subset of the star-tree's dimensions and the aggregated fields are a subset of the star-tree's metrics. | ||||||
- After a star-tree index is enabled, it cannot be disabled. In order to disable a star-tree index, the data in the index must be reindexed without the star-tree mapping. Furthermore, changing a star-tree configuration will also require a reindex operation. | ||||||
- [Multi-values/array values]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/index/#arrays) are not supported. | ||||||
|
@@ -68,6 +68,7 @@ To use a star-tree index, modify the following settings: | |||||
- Set the feature flag `opensearch.experimental.feature.composite_index.star_tree.enabled` to `true`. For more information about enabling and disabling feature flags, see [Enabling experimental features]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/). | ||||||
- Set the `indices.composite_index.star_tree.enabled` setting to `true`. For instructions on how to configure OpenSearch, see [Configuring settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings). | ||||||
- Set the `index.composite_index` index setting to `true` during index creation. | ||||||
- Set the `index.append_only.enabled` index setting to `true` during index creation. | ||||||
- Ensure that the `doc_values` parameter is enabled for the `dimensions` and `metrics` fields used in your star-tree mapping. | ||||||
|
||||||
|
||||||
|
@@ -81,7 +82,8 @@ PUT logs | |||||
"settings": { | ||||||
"index.number_of_shards": 1, | ||||||
"index.number_of_replicas": 0, | ||||||
"index.composite_index": true | ||||||
"index.composite_index": true, | ||||||
"index.append_only.enabled": true | ||||||
}, | ||||||
"mappings": { | ||||||
"composite": { | ||||||
|
@@ -94,6 +96,9 @@ PUT logs | |||||
}, | ||||||
{ | ||||||
"name": "port" | ||||||
}, | ||||||
{ | ||||||
"name": "method" | ||||||
} | ||||||
], | ||||||
"metrics": [ | ||||||
|
@@ -123,6 +128,9 @@ PUT logs | |||||
"size": { | ||||||
"type": "integer" | ||||||
}, | ||||||
"method" : { | ||||||
"type": "keyword" | ||||||
}, | ||||||
"latency": { | ||||||
"type": "scaled_float", | ||||||
"scaling_factor": 10 | ||||||
|
@@ -140,14 +148,20 @@ Star-tree indexes can be used to optimize queries and aggregations. | |||||
|
||||||
### Supported queries | ||||||
|
||||||
The following queries are supported as of OpenSearch 2.18: | ||||||
The following queries are supported as of OpenSearch 2.19: | ||||||
sandeshkr419 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
- [Term query]({{site.url}}{{site.baseurl}}/query-dsl/term/term/) | ||||||
- [Terms query]({{site.url}}{{site.baseurl}}/query-dsl/term/terms/) | ||||||
- [Match all docs query]({{site.url}}{{site.baseurl}}/query-dsl/match-all/) | ||||||
- [Range query]({{site.url}}{{site.baseurl}}/query-dsl/term/range/) | ||||||
sandeshkr419 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
To use a query with a star-tree index, the query's fields must be present in the `ordered_dimensions` section of the star-tree configuration. Queries must also be paired with a supported aggregation. | ||||||
To use a query with a star-tree index, the query's fields must be present in the `ordered_dimensions` section of the star-tree configuration. Also, queries must be paired with a supported aggregation. Queries without aggregations cannot be used with a star-tree index. Currently, queries on `date` fields are not supported, and will be added in later releases. | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
### Supported aggregations | ||||||
|
||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
The following aggregations are supported by star-tree indexes. | ||||||
|
||||||
#### Metric aggregations | ||||||
|
||||||
The following metric aggregations are supported as of OpenSearch 2.18: | ||||||
- [Sum]({{site.url}}{{site.baseurl}}/aggregations/metric/sum/) | ||||||
|
@@ -156,12 +170,12 @@ The following metric aggregations are supported as of OpenSearch 2.18: | |||||
- [Value count]({{site.url}}{{site.baseurl}}/aggregations/metric/value-count/) | ||||||
- [Average]({{site.url}}{{site.baseurl}}/aggregations/metric/average/) | ||||||
|
||||||
To use aggregations: | ||||||
To use searchable aggregations with a star tree index, remember the following prerequisites: | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
- The fields must be present in the `metrics` section of the star-tree configuration. | ||||||
- The metric aggregation type must be part of the `stats` parameter. | ||||||
|
||||||
### Aggregation example | ||||||
##### Example | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
The following example gets the sum of all the values in the `size` field for all error logs with `status=500`, using the [example mapping](#example-mapping): | ||||||
|
||||||
|
@@ -185,6 +199,52 @@ POST /logs/_search | |||||
|
||||||
Using a star-tree index, the result will be retrieved from a single aggregated document as it traverses the `status=500` node, as opposed to scanning through all of the matching documents. This results in lower query latency. | ||||||
|
||||||
### Date histogram with metric aggregations | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
You can use [date histograms]({{site.url}}{{site.baseurl}}/aggregations/bucket/date-histogram/) on calendar intervals with metric sub-aggregations. | ||||||
|
||||||
To use date histogram aggregations and make then searchable in the star-tree index, use the following steps: | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
natebower marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
- The calendar intervals in a star-tree mapping configuration can have either the request's calendar field or a field of lower granularity than the request field. For example, if an aggregation uses the `month` field, the star-tree search can still use lower granularity fields such as `day`. | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
- A metric sub-aggregation must be part of the aggregation request. | ||||||
|
||||||
#### Example | ||||||
|
||||||
The following example gets the sum of all the values in the `size` field aggregated for each calendar month, for all error logs with `method:get`: | ||||||
Naarcha-AWS marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
```json | ||||||
POST /logs/_search | ||||||
{ | ||||||
{ | ||||||
"query": { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this a valid query ? @sandeshkr419 just double checking. I don't see term/terms etc There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bharath-techie: I updated the query to add terms while keeping the method. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is still not a valid query , we don't support range on timestamp. Lets reword this @Naarcha-AWS There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bharath-techie: Can you suggest a valid query? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is a valid query which we can use @Naarcha-AWS |
||||||
"term": { | ||||||
"status": "500" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we use range query or keyword term query since term is already used in the above example ? Maybe keyword term query will be a good example. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bharath-techie: I'll update the example to use a range query. |
||||||
}, | ||||||
"method": { | ||||||
"status": "get" | ||||||
} | ||||||
}, | ||||||
"size": 0, | ||||||
"aggs": { | ||||||
"by_hour": { | ||||||
"date_histogram": { | ||||||
"field": "@timestamp", | ||||||
"calendar_interval": "month" | ||||||
}, | ||||||
"aggs": { | ||||||
"sum_size": { | ||||||
"sum": { | ||||||
"field": "size" | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
``` | ||||||
|
||||||
|
||||||
natebower marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
## Using queries without a star-tree index | ||||||
|
||||||
Set the `indices.composite_index.star_tree.enabled` setting to `false` to run queries without using a star-tree index. |
Uh oh!
There was an error while loading. Please reload this page.