Description
We want to extend the match function to be able to receive multiple fields, e.g.:
FROM movies
| WHERE match(title, plot, summary, "Harry Potter")
Lucene query translation:
When a single field is passed to match, we preserve the current behaviour and translate it to a match query.
When multiple fields are passed to the match function we will not push it down to Lucene as a multi_match query.
Instead we will have a dedicated QueryBuilder which will use a dis_max query with combined_fields queries.
The query builder will rewrite to a query that:
- groups all fields by their analyzer
- For each group, we use a combined_fields query
- All combined_fields queries will be grouped into one dis_max query
This rewrite will need to happen at the shard level, where we have access to what analyzers are set on the mappings.
Allowed options when using multiple fields:
We will only allow the options for combined_fields which are a subset of the options of the match query.
Option | match |
combined_fields |
---|---|---|
auto_generate_synonyms_phrase_query |
yes | yes |
operator |
yes | yes |
minimum_should_match |
yes | yes |
zero_terms_query |
yes | yes |
boost |
yes | yes |
fuzziness |
yes | no |
max_expansions |
yes | no |
prefix_length |
yes | no |
fuzzy_transpositions |
yes | no |
fuzzy_rewrite |
yes | no |
lenient |
yes | no |
In time we can look into extending support for all match options when querying multiple fields.
Using match with semantic_text fields
We currently support semantic search in ES|QL through the match function which allows semantic_text fields as an argument.
As we mentioned before when a single field is passed to match, we will continue to translate the function to a match query.
Therefore we don't break support for semantic search in ES|QL, it should continue to function as before.
However, when multiple fields are passed to the match function and one of them is a semantic_text field, we will return an error.
We do this because combining text and semantic_text fields requires more than pushing a query to Lucene.
We would need to do proper hybrid search (linear combination or RRF) to get accurate scores back, which would require significantly more work.
Therefore we push support for hybrid search in match at a later stage.
This is inline with what we have in DSL, where the multi_match query does not support querying semantic_text fields for example. We instead support hybrid search through retrievers which have a different execution model.
In ES|QL hybrid search will initially be supported through FORK and a dedicated command for relevance reranking.
Backwards compatibility
Since the behaviour for querying a single field remains the same, we are not breaking backwards compatibility.