Skip to content

[RFC] Query insights framework  #11429

Open
@ansjcy

Description

@ansjcy

Is your feature request related to a problem? Please describe.
OpenSearch stands as a versatile, scalable, open-source solution designed for diverse data exploration needs, ranging from interactive log analytics to real-time application monitoring. Despite its capabilities, OpenSearch users and administrators often encounter challenges in ensuring optimal search performance due to limited expertise or OpenSearch's current constraints in providing comprehensive data points on query executions. Common questions include:

  • Identification of top queries within a specific timeframe (“what are the top queries in the last 1 hour”).
  • Profiling users with the highest search query volumes (“how do I associate queries to users”).
  • Concerns about slow search queries (“why my search queries are so slow”).
  • Spikes in query latency (“why there was a spike in my search latency chart”).

The overarching objective of the Query Insights initiative is to address these issues by building frameworks, APIs, and dashboards, with minimal performance impact, to offer profound insights, metrics and recommendations into query executions, empowering users to better understand search query characteristics, patterns, and system behavior during query execution stages. Query Insights will facilitates enhanced detection, diagnosis, and prevension of query performance issues, ultimately improving query processing performance, user experience, and overall system resilience.

Let's discuss the scope and components of the framework!

Describe the solution you'd like

As we briefly discussed in this RFC, We want to design and build a robust framework that efficiently handles data collection, storage, processing, and export for query insights data. We need to build this framework in a resource efficient manner to minimize the impact on search performance. Also, we need to focus on the extensibility of the framework to ensure new metrics and the associated analysis and insights associated can be added easily.

The framework should have these main components: data collection, data storage and process, recommendation engine, and data export.

  • Collectors: Within OpenSearch, these components gather performance-related data points at various stages of search query executions.
  • Processors: Built in the Query Insights Plugin, these components perform lightweight aggregation and processing on data collected by the collectors.
  • Recommendation Engines: These components generate recommendations based on point-in-time query insights data within a cluster.
  • Customer experience: Various customer touch points, such as APIs, dashboards, metrics, and exporters, facilitate the presentation of insights and recommendations to customers.

The interactions between these components are illustrated in the chart below.

image

Data collection workflow, executed by request listeners, span listeners, or other components, channels information to one or more in-memory storage units for further analysis and post-processing. Subsequently, asynchronous processors kick in and analyze the data, generate insights and results (potentially utilizing stored historical data) - the query insights dashboard will also be using the analyzed and aggregated data to display the query insights charts. After that, the results will be handled by certain asynchronous exporters to export to different sinks.

Describe alternatives you've considered
As discussed in this comment of the Top N query RFC, we can potentially leverage the OPTL collector when it becomes available and migrate certain aggregation logic from the query insights components to OPTL collectors outside of OpenSearch process. With this approach, we can send traces/spans to OPTL collectors, where the collector takes responsibility for necessary calculations, aggregations and export. This strategy could further reduce the impact on the OpenSearch process.

Additional context
Some interesting discussions around this topic in the comments of: #11186

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Later (6 months plus)

Status

Done

Status

New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions