[RFC] Query insights framework 

**Is your feature request related to a problem? Please describe.**
OpenSearch stands as a versatile, scalable, open-source solution designed for diverse data exploration needs, ranging from interactive log analytics to real-time application monitoring. Despite its capabilities, OpenSearch users and administrators often encounter challenges in ensuring optimal search performance due to limited expertise or OpenSearch's current constraints in providing comprehensive data points on query executions. Common questions include:

* Identification of top queries within a specific timeframe (“what are the top queries in the last 1 hour”).
* Profiling users with the highest search query volumes (“how do I associate queries to users”).
* Concerns about slow search queries (“why my search queries are so slow”).
* Spikes in query latency (“why there was a spike in my search latency chart”).

The overarching objective of the Query Insights initiative is to address these issues by building frameworks, APIs, and dashboards, with minimal performance impact, to offer profound insights, metrics and recommendations into query executions, empowering users to better understand search query characteristics, patterns, and system behavior during query execution stages. Query Insights will facilitates enhanced detection, diagnosis, and prevension of query performance issues, ultimately improving query processing performance, user experience, and overall system resilience. 

Let's discuss the scope and components of the framework!

**Describe the solution you'd like**

As we briefly discussed in [this RFC](https://github.com/opensearch-project/OpenSearch/issues/11186#issuecomment-1815622253), We want to design and build a robust framework that efficiently handles data collection, storage, processing, and export for query insights data. We need to build this framework in a resource efficient manner to minimize the impact on search performance. Also, we need to focus on the extensibility of the framework to ensure new metrics and the associated analysis and insights associated can be added easily. 

The framework should have these main components: data collection, data storage and process, recommendation engine, and data export. 

* Collectors: Within OpenSearch, these components gather performance-related data points at various stages of search query executions.
* Processors: Built in the Query Insights Plugin, these components perform lightweight aggregation and processing on data collected by the collectors.
* Recommendation Engines: These components generate recommendations based on point-in-time query insights data within a cluster.
* Customer experience: Various customer touch points, such as APIs, dashboards, metrics, and exporters, facilitate the presentation of insights and recommendations to customers.

The interactions between these components are illustrated in the chart below. 

<img width="1059" alt="image" src="https://github.com/opensearch-project/OpenSearch/assets/7891523/4617d926-d895-4958-b285-bdfe0c7965cc">

Data collection workflow, executed by request listeners, span listeners, or other components, channels information to one or more in-memory storage units for further analysis and post-processing. Subsequently, asynchronous processors kick in and analyze the data, generate insights and results (potentially utilizing stored historical data) - the query insights dashboard will also be using the analyzed and aggregated data to display the query insights charts. After that, the results will be handled by certain asynchronous exporters to export to different sinks.

**Describe alternatives you've considered**
As discussed in [this comment](https://github.com/opensearch-project/OpenSearch/issues/11186#issuecomment-1820072102) of the Top N query RFC, we can potentially leverage the OPTL collector when it becomes available and migrate certain aggregation logic from the query insights components to OPTL collectors outside of OpenSearch process. With this approach, we can send traces/spans to OPTL collectors, where the collector takes responsibility for necessary calculations, aggregations and export. This strategy could further reduce the impact on the OpenSearch process. 

**Additional context**
Some interesting discussions around this topic in the comments of: https://github.com/opensearch-project/OpenSearch/issues/11186


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Query insights framework #11429

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Query insights framework #11429

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions