Skip to content

ESQL: Add multi field support in match #121525

Open
@ioanatia

Description

@ioanatia

We want to extend the match function to be able to receive multiple fields, e.g.:

FROM movies
| WHERE match(title, plot, summary, "Harry Potter")

Lucene query translation:

When a single field is passed to match, we preserve the current behaviour and translate it to a match query.
When multiple fields are passed to the match function we will not push it down to Lucene as a multi_match query.
Instead we will have a dedicated QueryBuilder which will use a dis_max query with combined_fields queries.
The query builder will rewrite to a query that:

  • groups all fields by their analyzer
  • For each group, we use a combined_fields query
  • All combined_fields queries will be grouped into one dis_max query

This rewrite will need to happen at the shard level, where we have access to what analyzers are set on the mappings.

Allowed options when using multiple fields:

We will only allow the options for combined_fields which are a subset of the options of the match query.

Option match combined_fields
auto_generate_synonyms_phrase_query yes yes
operator yes yes
minimum_should_match yes yes
zero_terms_query yes yes
boost yes yes
fuzziness yes no
max_expansions yes no
prefix_length yes no
fuzzy_transpositions yes no
fuzzy_rewrite yes no
lenient yes no

In time we can look into extending support for all match options when querying multiple fields.

Using match with semantic_text fields

We currently support semantic search in ES|QL through the match function which allows semantic_text fields as an argument.
As we mentioned before when a single field is passed to match, we will continue to translate the function to a match query.
Therefore we don't break support for semantic search in ES|QL, it should continue to function as before.

However, when multiple fields are passed to the match function and one of them is a semantic_text field, we will return an error.
We do this because combining text and semantic_text fields requires more than pushing a query to Lucene.
We would need to do proper hybrid search (linear combination or RRF) to get accurate scores back, which would require significantly more work.
Therefore we push support for hybrid search in match at a later stage.
This is inline with what we have in DSL, where the multi_match query does not support querying semantic_text fields for example. We instead support hybrid search through retrievers which have a different execution model.
In ES|QL hybrid search will initially be supported through FORK and a dedicated command for relevance reranking.

Backwards compatibility

Since the behaviour for querying a single field remains the same, we are not breaking backwards compatibility.

Metadata

Metadata

Labels

:Search Relevance/SearchCatch all for Search Relevance>featureES|QL-uiImpacts ES|QL UITeam:Search RelevanceMeta label for the Search Relevance team in Elasticsearchpriority:normalA label for assessing bug priority to be used by ES engineers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions