[Query Insights] Capture query-level resource usage metrics

### Is your feature request related to a problem? Please describe

The resource tracking framework (https://github.com/opensearch-project/OpenSearch/issues/1179) tracks task-level resource usage, such as CPU and memory utilization. However there's a gap to infer query-level resource usage from the resource tracking framework. We need to come up a solution for it since it would one of the most important metrics for query insights (https://github.com/opensearch-project/OpenSearch/issues/11429) features like top n queries (https://github.com/opensearch-project/OpenSearch/issues/11186) and also cost estimations (https://github.com/opensearch-project/OpenSearch/issues/12390).

### Describe the solution you'd like

The most challenging task here is how to propagate the task-level resource usage information to the coordinator node for calculating query-level resource usage. The most straightforward solution is to piggyback the resource usage data as part of the `SearchPhaseResult` node response and use `SearchRequestOperationsListener::onPhaseEnd` to extract this information from the phase results and forward it to the query insights framework. However, this approach has limitations as the obtained resource usage data may not be entirely accurate. The reason is explained below.

Here's the workflow of a search request and resource tracking: The coordinator node sends requests to data nodes and the data nodes will create tasks to do search on shards. On a data node,
1. The data node creates tasks and starts request tracking using the resource tracking framework;
2. The data node sends back the `SearchPhaseResult` to the coordinator node;
3. The data node stops thes task and stops request tracking, and records the final resource utilization in resource tracking framework.

If we want to piggyback the resource utilization data in `SearchPhaseResult`, we must retrieve this data before the task is considered "finished."  Through some experiments and analysis, I found reading the resource utilization data before the second step would result in up to ~10% lower CPU and Memory utilization compared to the final actual usage. If the actual results are not accurate at all, the data would be of no use except for roughly analyzing the overall usage trend and "relative" resource usage comparasion between 2 queries - We won't be able to use this data to make reliable query cost estimations.

### Related component

Search:Query Insights

### Describe alternatives you've considered

Another approach is to implement an asynchronous post-processor as part of the query insights data consumption pipeline. This post-processor would periodically gather data from data nodes and correlate it with queries to calculate the final resource usage accurately. While this method ensures the most accurate resource usage data, it comes with the overhead of introducing a periodic job running in the background to collect and share the data between nodes. We need to consider the trade-offs when deciding on the best approach for capturing query-level resource usage.

### Additional context

- Query Insights Meta issue: https://github.com/opensearch-project/OpenSearch/issues/11522
- Top N Queries RFC: https://github.com/opensearch-project/OpenSearch/issues/11186

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Query Insights] Capture query-level resource usage metrics #12399

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Query Insights] Capture query-level resource usage metrics #12399

Description

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions