Fix minscore propagation in text similarity reranker #129223

mridula-s109 · 2025-06-10T18:51:43Z

Summary

This PR addresses an issue where the min_score parameter wasn't being properly propagated in the text similarity reranker. The fix ensures that documents are correctly filtered based on the specified minimum score threshold.

Changes

Fixed min_score Validation and Filtering:
- Updated the validation to allow min_score values greater than or equal to zero
- Modified the filtering logic to use inclusive comparison (>=) instead of exclusive (>)
Test Case Updates:
- Added test cases for min_score = 0 to verify edge case handling
- Included tests for high min_score values to ensure proper filtering
- Updated existing test assertions to expect inclusive scoring behavior

Bug Fix Verification

The bug has been verified as fixed by running a test query with min_score: 10. The query now correctly returns no results when no documents meet the minimum score threshold:

GET books/_search
{
  "retriever": {
    "text_similarity_reranker": {
      "retriever": {
        "standard": {
          "query": {
            "multi_match": {
              "query": "alien"
            }
          }
        }
      },
      "inference_id": ".rerank-...",
      "field": "synopsis",
      "min_score": 10
    }
  }
}

// Result:
{
  "hits": {
    "hits": []
  }
}

Test Results

All Tests Passing Successfully ✅

All test cases related to the text similarity rank retriever are now passing, including the newly added tests for min_score functionality. The test suite shows:

Small changes in BlobContainer interface and wrapper. Relates ES-11815

…129054) The reason the test fails is that operations contained _seq_no field with different doc value types (with no skippers and with skippers) and this isn't allowed, since field types need to be consistent in a Lucene index. The initial operations were generated not knowing about the fact the index mode was set to logsdb or time_series. Causing the operations to not have doc value skippers. However when replaying the operations via following engine, the operations did have doc value skippers. The fix is to set `index.seq_no.index_options` to `points_and_doc_values`, so that the initial operations are indexed without doc value skippers. This test doesn't gain anything from storing seqno with doc value skippers, so there is no loss of testing coverage. Closes elastic#128541

This ensures we package an aggregation zip with all artifacts we want to publish to maven central as part of a release. Running zipAggregation will produce a zip file in the build/nmcp/zip folder. The content of this zip is meant to match the maven artifacts we have currently declared as dra maven artifacts.

Runs a sanity check after loading a block of values. Previously we were doing a quick check if assertions were enabled. Now we do two quick checks all the time. Better - we attach information about how a block was loaded when there's a problem. Relates to elastic#128959

The functionality in `PhaseCacheManagement` was already project-aware, but these tests were still using deprecated methods.

This adds some testing tools for verifying vector recall and latency directly without having to spin up an entire ES node and running a rally track. Its pretty barebones and takes inspiration from lucene-util, but I wanted access to our own formats and tooling to make our lives easier. Here is an example config file. This will build the initial index, run queries at num_candidates: 50, then again at num_candidates 100 (without reindexing, and re-using the cached nearest neighbors). ``` [{ "doc_vectors" : "path", "query_vectors" : "path", "num_docs" : 10000, "num_queries" : 10, "index_type" : "hnsw", "num_candidates" : 50, "k" : 10, "hnsw_m" : 16, "hnsw_ef_construction" : 200, "index_threads" : 4, "reindex" : true, "force_merge" : false, "vector_space" : "maximum_inner_product", "dimensions" : 768 }, { "doc_vectors" : "path", "query_vectors" : "path", "num_docs" : 10000, "num_queries" : 10, "index_type" : "hnsw", "num_candidates" : 100, "k" : 10, "hnsw_m" : 16, "hnsw_ef_construction" : 200, "vector_space" : "maximum_inner_product", "dimensions" : 768 } ] ``` To execute: ``` ./gradlew :qa:vector:checkVec --args="/Path/to/knn_tester_config.json" ``` Calling `./gradlew :qa:vector:checkVecHelp` gives some guidance on how to use it, additionally providing a way to run it via java directly (useful to bypass gradlew guff).

Add a spec test of `LOOKUP JOIN` against a time series index.

This is part of an iterative process to make ILM project-aware.

…t {lookup-join.LookupJoinOnTimeSeriesIndex ASYNC} elastic#129078

…9076) The `ClusterState` parameter of the `asyncPredicate` is not used anywhere.

…t {lookup-join.LookupJoinOnTimeSeriesIndex SYNC} elastic#129082

…est {p0=upgraded_cluster/70_ilm/Test Lifecycle Still There And Indices Are Still Managed} elastic#129097

…est {p0=upgraded_cluster/90_ml_data_frame_analytics_crud/Get mixed cluster outlier_detection job} elastic#129098

…ollowedWithEnvironmentVariableFiles elastic#128867

…27613) This PR introduces 3 new settings: indices.merge.disk.check_interval, indices.merge.disk.watermark.high, and indices.merge.disk.watermark.high.max_headroom that control if the threadpool merge executor starts executing new merges when the disk space is getting low. The intent of this change is to avoid the situation where in-progress merges exhaust the available disk space on the node's local filesystem. To this end, the thread pool merge executor periodically monitors the available disk space, as well as the current disk space estimates required by all in-progress (currently running) merges on the node, and will NOT schedule any new merges if the disk space is getting low (by default below the 5% limit of the total disk space, or 100 GB, whichever is smaller (same as the disk allocation flood stage level)).

…tic#128735) This PR introduces a new include_vectors option to the _source retrieval context. When set to false, vectors are excluded from the returned _source. This is especially efficient when used with synthetic source, as it avoids loading vector fields entirely. By default, vectors remain included unless explicitly excluded.

…kSpaceTests testAvailableDiskSpaceMonitorWhenFileSystemStatErrors elastic#129149

…ic#129033) * Add transport version for ML inference Mistral chat completion * Add changelog for Mistral Chat Completion version fix * Revert "Add changelog for Mistral Chat Completion version fix" This reverts commit 7a57416.

All we care about is if reindex is true or false. We shouldn't worry about force merge. Because if reindex is true, we will create the directory, if its false, we won't.

…kSpaceTests testUnavailableBudgetBlocksNewMergeTasksFromStartingExecution elastic#129148

* Google Vertex AI completion model, response entity and tests * Fixed GoogleVertexAiServiceTest for Service configuration * Changelog * Removed downcasting and using `moveToFirstToken` * Create GoogleVertexAiChatCompletionResponseHandler for streaming and non streaming responses * Added unit tests * PR feedback * Removed googlevertexaicompletion model. Using just GoogleVertexAiChatCompletionModel for completion and chat completion * Renamed uri -> nonStreamingUri. Added streamingUri and getters in GoogleVertexAiChatCompletionModel * Moved rateLimitGroupHashing to subclasses of GoogleVertexAiModel * Fixed rate limit has of GoogleVertexAiRerankModel and refactored uri for GoogleVertexAiUnifiedChatCompletionRequest --------- Co-authored-by: lhoet-google <[email protected]> Co-authored-by: Jonathan Buttner <[email protected]>

Copilot

Pull Request Overview

This PR fixes an issue where the min_score parameter was not correctly propagated in the text similarity reranker by updating both the filtering logic in the retriever builder and the associated test cases.

Fixed min_score validation and filtering behavior via inclusive comparison
Added new test cases and updated existing ones to verify correct min_score handling
Propagated the min_score parameter across relevant builder components

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/70_text_similarity_rank_retriever.yml	Adds new YAML tests to verify min_score functionality (including edge cases)
x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/rank/textsimilarity/TextSimilarityRankRetrieverBuilder.java	Applies filtering based on min_score after reranking and introduces a new node feature for the fix
x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceFeatures.java	Registers the new node feature for min_score fix
server/src/test/java/org/elasticsearch/search/retriever/RankDocsRetrieverBuilderTests.java	Updates test case factory to include an explicit null minScore parameter
server/src/main/java/org/elasticsearch/search/retriever/RankDocsRetrieverBuilder.java	Introduces a new field and logic to propagate min_score with an inclusive comparison
server/src/main/java/org/elasticsearch/search/retriever/CompoundRetrieverBuilder.java	Propagates the min_score parameter from the compound retriever

server/src/main/java/org/elasticsearch/search/retriever/RankDocsRetrieverBuilder.java

elasticsearchmachine · 2025-06-12T11:27:12Z

Pinging @elastic/es-search (Team:Search)

server/src/main/java/org/elasticsearch/search/retriever/RankDocsRetrieverBuilder.java

…t-propagate-min-score-correctly

kderusso

Changes look good, one cleanup comment and waiting on a clean CI build.

server/src/main/java/org/elasticsearch/search/retriever/RankDocsRetrieverBuilder.java

…t-propagate-min-score-correctly

kderusso · 2025-06-12T16:55:17Z

@mridula-s109 can you please also backport to 9.0? Thank you

Mikep86 · 2025-06-16T22:02:45Z

If we're backporting to 9.0, this should also go to 8.18, seeing as those are sibling releases

mridula-s109 · 2025-06-17T08:53:41Z

If we're backporting to 9.0, this should also go to 8.18, seeing as those are sibling releases

@kderusso So should i backport these changes to both 9.0 and 8.18?

kderusso · 2025-06-17T12:41:41Z

@mridula-s109 yes, please backport to both as this is a bug fix.

mridula-s109 · 2025-06-17T12:47:47Z

@mridula-s109 yes, please backport to both as this is a bug fix.

Thanks @kathleen, will do the same!

* propgating retrievers to inner retrievers * test feature taken care of * Small changes in concurrent multipart upload interfaces (elastic#128977) Small changes in BlobContainer interface and wrapper. Relates ES-11815 * Unmute FollowingEngineTests#testProcessOnceOnPrimary() test (elastic#129054) The reason the test fails is that operations contained _seq_no field with different doc value types (with no skippers and with skippers) and this isn't allowed, since field types need to be consistent in a Lucene index. The initial operations were generated not knowing about the fact the index mode was set to logsdb or time_series. Causing the operations to not have doc value skippers. However when replaying the operations via following engine, the operations did have doc value skippers. The fix is to set `index.seq_no.index_options` to `points_and_doc_values`, so that the initial operations are indexed without doc value skippers. This test doesn't gain anything from storing seqno with doc value skippers, so there is no loss of testing coverage. Closes elastic#128541 * [Build] Add support for publishing to maven central (elastic#128659) This ensures we package an aggregation zip with all artifacts we want to publish to maven central as part of a release. Running zipAggregation will produce a zip file in the build/nmcp/zip folder. The content of this zip is meant to match the maven artifacts we have currently declared as dra maven artifacts. * ESQL: Check for errors while loading blocks (elastic#129016) Runs a sanity check after loading a block of values. Previously we were doing a quick check if assertions were enabled. Now we do two quick checks all the time. Better - we attach information about how a block was loaded when there's a problem. Relates to elastic#128959 * Make `PhaseCacheManagementTests` project-aware (elastic#129047) The functionality in `PhaseCacheManagement` was already project-aware, but these tests were still using deprecated methods. * Vector test tools (elastic#128934) This adds some testing tools for verifying vector recall and latency directly without having to spin up an entire ES node and running a rally track. Its pretty barebones and takes inspiration from lucene-util, but I wanted access to our own formats and tooling to make our lives easier. Here is an example config file. This will build the initial index, run queries at num_candidates: 50, then again at num_candidates 100 (without reindexing, and re-using the cached nearest neighbors). ``` [{ "doc_vectors" : "path", "query_vectors" : "path", "num_docs" : 10000, "num_queries" : 10, "index_type" : "hnsw", "num_candidates" : 50, "k" : 10, "hnsw_m" : 16, "hnsw_ef_construction" : 200, "index_threads" : 4, "reindex" : true, "force_merge" : false, "vector_space" : "maximum_inner_product", "dimensions" : 768 }, { "doc_vectors" : "path", "query_vectors" : "path", "num_docs" : 10000, "num_queries" : 10, "index_type" : "hnsw", "num_candidates" : 100, "k" : 10, "hnsw_m" : 16, "hnsw_ef_construction" : 200, "vector_space" : "maximum_inner_product", "dimensions" : 768 } ] ``` To execute: ``` ./gradlew :qa:vector:checkVec --args="/Path/to/knn_tester_config.json" ``` Calling `./gradlew :qa:vector:checkVecHelp` gives some guidance on how to use it, additionally providing a way to run it via java directly (useful to bypass gradlew guff). * ES|QL: refactor generative tests (elastic#129028) * Add a test of LOOKUP JOIN against a time series index (elastic#129007) Add a spec test of `LOOKUP JOIN` against a time series index. * Make ILM `ClusterStateWaitStep` project-aware (elastic#129042) This is part of an iterative process to make ILM project-aware. * Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {lookup-join.LookupJoinOnTimeSeriesIndex ASYNC} elastic#129078 * Remove `ClusterState` param from ILM `AsyncBranchingStep` (elastic#129076) The `ClusterState` parameter of the `asyncPredicate` is not used anywhere. * Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {lookup-join.LookupJoinOnTimeSeriesIndex SYNC} elastic#129082 * Mute org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT test {p0=upgraded_cluster/70_ilm/Test Lifecycle Still There And Indices Are Still Managed} elastic#129097 * Mute org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT test {p0=upgraded_cluster/90_ml_data_frame_analytics_crud/Get mixed cluster outlier_detection job} elastic#129098 * Mute org.elasticsearch.packaging.test.DockerTests test081SymlinksAreFollowedWithEnvironmentVariableFiles elastic#128867 * Threadpool merge executor is aware of available disk space (elastic#127613) This PR introduces 3 new settings: indices.merge.disk.check_interval, indices.merge.disk.watermark.high, and indices.merge.disk.watermark.high.max_headroom that control if the threadpool merge executor starts executing new merges when the disk space is getting low. The intent of this change is to avoid the situation where in-progress merges exhaust the available disk space on the node's local filesystem. To this end, the thread pool merge executor periodically monitors the available disk space, as well as the current disk space estimates required by all in-progress (currently running) merges on the node, and will NOT schedule any new merges if the disk space is getting low (by default below the 5% limit of the total disk space, or 100 GB, whichever is smaller (same as the disk allocation flood stage level)). * Add option to include or exclude vectors from _source retrieval (elastic#128735) This PR introduces a new include_vectors option to the _source retrieval context. When set to false, vectors are excluded from the returned _source. This is especially efficient when used with synthetic source, as it avoids loading vector fields entirely. By default, vectors remain included unless explicitly excluded. * Remove direct minScore propagation to inner retrievers * cleaned up skip * Mute org.elasticsearch.index.engine.ThreadPoolMergeExecutorServiceDiskSpaceTests testAvailableDiskSpaceMonitorWhenFileSystemStatErrors elastic#129149 * Add transport version for ML inference Mistral chat completion (elastic#129033) * Add transport version for ML inference Mistral chat completion * Add changelog for Mistral Chat Completion version fix * Revert "Add changelog for Mistral Chat Completion version fix" This reverts commit 7a57416. * Correct index path validation (elastic#129144) All we care about is if reindex is true or false. We shouldn't worry about force merge. Because if reindex is true, we will create the directory, if its false, we won't. * Mute org.elasticsearch.index.engine.ThreadPoolMergeExecutorServiceDiskSpaceTests testUnavailableBudgetBlocksNewMergeTasksFromStartingExecution elastic#129148 * Implemented completion task for Google VertexAI (elastic#128694) * Google Vertex AI completion model, response entity and tests * Fixed GoogleVertexAiServiceTest for Service configuration * Changelog * Removed downcasting and using `moveToFirstToken` * Create GoogleVertexAiChatCompletionResponseHandler for streaming and non streaming responses * Added unit tests * PR feedback * Removed googlevertexaicompletion model. Using just GoogleVertexAiChatCompletionModel for completion and chat completion * Renamed uri -> nonStreamingUri. Added streamingUri and getters in GoogleVertexAiChatCompletionModel * Moved rateLimitGroupHashing to subclasses of GoogleVertexAiModel * Fixed rate limit has of GoogleVertexAiRerankModel and refactored uri for GoogleVertexAiUnifiedChatCompletionRequest --------- Co-authored-by: lhoet-google <[email protected]> Co-authored-by: Jonathan Buttner <[email protected]> * Fixing minscore filtering in the text similarity reranker * ES|QL - kNN function initial support (elastic#127322) * Remove optional seed from ES|QL SAMPLE (elastic#128887) * Remove optional seed from ES|QL SAMPLE * make it clear that seed is for testing * [Inference API] Add "rerank" task type to "elastic" provider (elastic#126022) * Rename target destination for microbenchmarks (elastic#128878) * Include direct memory and non-heap memory in ML memory calculations (take elastic#2) (elastic#128742) * Include direct memory and non-heap memory in ML memory calculations. * Reduce ML_ONLY heap size, so that direct memory is accounted for. * [CI] Auto commit changes from spotless * changelog * improve docs * Reuse direct memory to heap factor * feature flag --------- Co-authored-by: elasticsearchmachine <[email protected]> * Throw better exception for unsupported aggregations over shape fields (elastic#129139) * Update Test Framework To Handle Query Rewrites That Rely on Non-Null Searchers (elastic#129160) * Update ReproduceInfoPrinter to correctly print a reproduction line for Lucene & build candidate upgrade tests (elastic#129044) * Increment inference stats counter for shard bulk inference calls (elastic#129140) This change updates the inference stats counter to include chunked inference calls performed by the shard bulk inference filter on all semantic text fields. It ensures that usage of inference on semantic text fields is properly recorded in the stats. * Synthetic source: avoid storing multi fields of type text and match_only_text by default. (elastic#129126) Don't store text and match_only_text field by default when source mode is synthetic and a field is a multi field or when there is a suitable multi field. Without this change, ES would store field otherwise twice in a multi-field configuration. For example: ``` ... "os": { "properties": { "name": { "ignore_above": 1024, "type": "keyword", "fields": { "text": { "type": "match_only_text" } } } ... ``` In this case, two stored fields were added, one in case for the `name` field and one for `name.text` multi-field. This change prevents this, and would never store a stored field when text or match_only_text field is a multi-field. * Adding `scheduled_report_id` field to kibana reporting template (elastic#127827) * Adding scheduled_report_id field to kibana reporting template * Incrementing stack template registry version * ES|QL: Add FORK generative tests (elastic#129135) * ES|QL Completion command syntax change (elastic#129189) * propagated minscore to rankdsocsretrieverbuilder * Modified the file to include minscore and the test case to verify it * Revert "Use IndexOrDocValuesQuery in NumberFieldType#termQuery implementations (elastic#128293)" (elastic#129206) This reverts commit de7c91c. * Fixed the rankdocsretriever builder * Update docs/changelog/129223.yaml * Update 129223.yaml * trying to introduce cluster featureS * included cluster features in the test * Fixed the merge issue * [CI] Auto commit changes from spotless * Removed local variable from RankDocsRetrieverBuilder * Update RankDocsRetrieverBuilder.java --------- Co-authored-by: Tanguy Leroux <[email protected]> Co-authored-by: Martijn van Groningen <[email protected]> Co-authored-by: Rene Groeschke <[email protected]> Co-authored-by: Nik Everett <[email protected]> Co-authored-by: Niels Bauman <[email protected]> Co-authored-by: Benjamin Trent <[email protected]> Co-authored-by: Luigi Dell'Aquila <[email protected]> Co-authored-by: Bogdan Pintea <[email protected]> Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Albert Zaharovits <[email protected]> Co-authored-by: Jim Ferenczi <[email protected]> Co-authored-by: Jan-Kazlouski-elastic <[email protected]> Co-authored-by: Leonardo Hoet <[email protected]> Co-authored-by: lhoet-google <[email protected]> Co-authored-by: Jonathan Buttner <[email protected]> Co-authored-by: Carlos Delgado <[email protected]> Co-authored-by: Jan Kuipers <[email protected]> Co-authored-by: Tim Grein <[email protected]> Co-authored-by: Ievgen Degtiarenko <[email protected]> Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Ignacio Vera <[email protected]> Co-authored-by: Mike Pellegrini <[email protected]> Co-authored-by: Moritz Mack <[email protected]> Co-authored-by: Ying Mao <[email protected]> Co-authored-by: Ioana Tagirta <[email protected]> Co-authored-by: Aurélien FOUCRET <[email protected]>

* propgating retrievers to inner retrievers * test feature taken care of * Small changes in concurrent multipart upload interfaces (#128977) Small changes in BlobContainer interface and wrapper. Relates ES-11815 * Unmute FollowingEngineTests#testProcessOnceOnPrimary() test (#129054) The reason the test fails is that operations contained _seq_no field with different doc value types (with no skippers and with skippers) and this isn't allowed, since field types need to be consistent in a Lucene index. The initial operations were generated not knowing about the fact the index mode was set to logsdb or time_series. Causing the operations to not have doc value skippers. However when replaying the operations via following engine, the operations did have doc value skippers. The fix is to set `index.seq_no.index_options` to `points_and_doc_values`, so that the initial operations are indexed without doc value skippers. This test doesn't gain anything from storing seqno with doc value skippers, so there is no loss of testing coverage. Closes #128541 * [Build] Add support for publishing to maven central (#128659) This ensures we package an aggregation zip with all artifacts we want to publish to maven central as part of a release. Running zipAggregation will produce a zip file in the build/nmcp/zip folder. The content of this zip is meant to match the maven artifacts we have currently declared as dra maven artifacts. * ESQL: Check for errors while loading blocks (#129016) Runs a sanity check after loading a block of values. Previously we were doing a quick check if assertions were enabled. Now we do two quick checks all the time. Better - we attach information about how a block was loaded when there's a problem. Relates to #128959 * Make `PhaseCacheManagementTests` project-aware (#129047) The functionality in `PhaseCacheManagement` was already project-aware, but these tests were still using deprecated methods. * Vector test tools (#128934) This adds some testing tools for verifying vector recall and latency directly without having to spin up an entire ES node and running a rally track. Its pretty barebones and takes inspiration from lucene-util, but I wanted access to our own formats and tooling to make our lives easier. Here is an example config file. This will build the initial index, run queries at num_candidates: 50, then again at num_candidates 100 (without reindexing, and re-using the cached nearest neighbors). ``` [{ "doc_vectors" : "path", "query_vectors" : "path", "num_docs" : 10000, "num_queries" : 10, "index_type" : "hnsw", "num_candidates" : 50, "k" : 10, "hnsw_m" : 16, "hnsw_ef_construction" : 200, "index_threads" : 4, "reindex" : true, "force_merge" : false, "vector_space" : "maximum_inner_product", "dimensions" : 768 }, { "doc_vectors" : "path", "query_vectors" : "path", "num_docs" : 10000, "num_queries" : 10, "index_type" : "hnsw", "num_candidates" : 100, "k" : 10, "hnsw_m" : 16, "hnsw_ef_construction" : 200, "vector_space" : "maximum_inner_product", "dimensions" : 768 } ] ``` To execute: ``` ./gradlew :qa:vector:checkVec --args="/Path/to/knn_tester_config.json" ``` Calling `./gradlew :qa:vector:checkVecHelp` gives some guidance on how to use it, additionally providing a way to run it via java directly (useful to bypass gradlew guff). * ES|QL: refactor generative tests (#129028) * Add a test of LOOKUP JOIN against a time series index (#129007) Add a spec test of `LOOKUP JOIN` against a time series index. * Make ILM `ClusterStateWaitStep` project-aware (#129042) This is part of an iterative process to make ILM project-aware. * Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {lookup-join.LookupJoinOnTimeSeriesIndex ASYNC} #129078 * Remove `ClusterState` param from ILM `AsyncBranchingStep` (#129076) The `ClusterState` parameter of the `asyncPredicate` is not used anywhere. * Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {lookup-join.LookupJoinOnTimeSeriesIndex SYNC} #129082 * Mute org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT test {p0=upgraded_cluster/70_ilm/Test Lifecycle Still There And Indices Are Still Managed} #129097 * Mute org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT test {p0=upgraded_cluster/90_ml_data_frame_analytics_crud/Get mixed cluster outlier_detection job} #129098 * Mute org.elasticsearch.packaging.test.DockerTests test081SymlinksAreFollowedWithEnvironmentVariableFiles #128867 * Threadpool merge executor is aware of available disk space (#127613) This PR introduces 3 new settings: indices.merge.disk.check_interval, indices.merge.disk.watermark.high, and indices.merge.disk.watermark.high.max_headroom that control if the threadpool merge executor starts executing new merges when the disk space is getting low. The intent of this change is to avoid the situation where in-progress merges exhaust the available disk space on the node's local filesystem. To this end, the thread pool merge executor periodically monitors the available disk space, as well as the current disk space estimates required by all in-progress (currently running) merges on the node, and will NOT schedule any new merges if the disk space is getting low (by default below the 5% limit of the total disk space, or 100 GB, whichever is smaller (same as the disk allocation flood stage level)). * Add option to include or exclude vectors from _source retrieval (#128735) This PR introduces a new include_vectors option to the _source retrieval context. When set to false, vectors are excluded from the returned _source. This is especially efficient when used with synthetic source, as it avoids loading vector fields entirely. By default, vectors remain included unless explicitly excluded. * Remove direct minScore propagation to inner retrievers * cleaned up skip * Mute org.elasticsearch.index.engine.ThreadPoolMergeExecutorServiceDiskSpaceTests testAvailableDiskSpaceMonitorWhenFileSystemStatErrors #129149 * Add transport version for ML inference Mistral chat completion (#129033) * Add transport version for ML inference Mistral chat completion * Add changelog for Mistral Chat Completion version fix * Revert "Add changelog for Mistral Chat Completion version fix" This reverts commit 7a57416. * Correct index path validation (#129144) All we care about is if reindex is true or false. We shouldn't worry about force merge. Because if reindex is true, we will create the directory, if its false, we won't. * Mute org.elasticsearch.index.engine.ThreadPoolMergeExecutorServiceDiskSpaceTests testUnavailableBudgetBlocksNewMergeTasksFromStartingExecution #129148 * Implemented completion task for Google VertexAI (#128694) * Google Vertex AI completion model, response entity and tests * Fixed GoogleVertexAiServiceTest for Service configuration * Changelog * Removed downcasting and using `moveToFirstToken` * Create GoogleVertexAiChatCompletionResponseHandler for streaming and non streaming responses * Added unit tests * PR feedback * Removed googlevertexaicompletion model. Using just GoogleVertexAiChatCompletionModel for completion and chat completion * Renamed uri -> nonStreamingUri. Added streamingUri and getters in GoogleVertexAiChatCompletionModel * Moved rateLimitGroupHashing to subclasses of GoogleVertexAiModel * Fixed rate limit has of GoogleVertexAiRerankModel and refactored uri for GoogleVertexAiUnifiedChatCompletionRequest --------- * Fixing minscore filtering in the text similarity reranker * ES|QL - kNN function initial support (#127322) * Remove optional seed from ES|QL SAMPLE (#128887) * Remove optional seed from ES|QL SAMPLE * make it clear that seed is for testing * [Inference API] Add "rerank" task type to "elastic" provider (#126022) * Rename target destination for microbenchmarks (#128878) * Include direct memory and non-heap memory in ML memory calculations (take #2) (#128742) * Include direct memory and non-heap memory in ML memory calculations. * Reduce ML_ONLY heap size, so that direct memory is accounted for. * [CI] Auto commit changes from spotless * changelog * improve docs * Reuse direct memory to heap factor * feature flag --------- * Throw better exception for unsupported aggregations over shape fields (#129139) * Update Test Framework To Handle Query Rewrites That Rely on Non-Null Searchers (#129160) * Update ReproduceInfoPrinter to correctly print a reproduction line for Lucene & build candidate upgrade tests (#129044) * Increment inference stats counter for shard bulk inference calls (#129140) This change updates the inference stats counter to include chunked inference calls performed by the shard bulk inference filter on all semantic text fields. It ensures that usage of inference on semantic text fields is properly recorded in the stats. * Synthetic source: avoid storing multi fields of type text and match_only_text by default. (#129126) Don't store text and match_only_text field by default when source mode is synthetic and a field is a multi field or when there is a suitable multi field. Without this change, ES would store field otherwise twice in a multi-field configuration. For example: ``` ... "os": { "properties": { "name": { "ignore_above": 1024, "type": "keyword", "fields": { "text": { "type": "match_only_text" } } } ... ``` In this case, two stored fields were added, one in case for the `name` field and one for `name.text` multi-field. This change prevents this, and would never store a stored field when text or match_only_text field is a multi-field. * Adding `scheduled_report_id` field to kibana reporting template (#127827) * Adding scheduled_report_id field to kibana reporting template * Incrementing stack template registry version * ES|QL: Add FORK generative tests (#129135) * ES|QL Completion command syntax change (#129189) * propagated minscore to rankdsocsretrieverbuilder * Modified the file to include minscore and the test case to verify it * Revert "Use IndexOrDocValuesQuery in NumberFieldType#termQuery implementations (#128293)" (#129206) This reverts commit de7c91c. * Fixed the rankdocsretriever builder * Update docs/changelog/129223.yaml * Update 129223.yaml * trying to introduce cluster featureS * included cluster features in the test * Fixed the merge issue * [CI] Auto commit changes from spotless * Removed local variable from RankDocsRetrieverBuilder * Update RankDocsRetrieverBuilder.java --------- Co-authored-by: Tanguy Leroux <[email protected]> Co-authored-by: Martijn van Groningen <[email protected]> Co-authored-by: Rene Groeschke <[email protected]> Co-authored-by: Nik Everett <[email protected]> Co-authored-by: Niels Bauman <[email protected]> Co-authored-by: Benjamin Trent <[email protected]> Co-authored-by: Luigi Dell'Aquila <[email protected]> Co-authored-by: Bogdan Pintea <[email protected]> Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Albert Zaharovits <[email protected]> Co-authored-by: Jim Ferenczi <[email protected]> Co-authored-by: Jan-Kazlouski-elastic <[email protected]> Co-authored-by: Leonardo Hoet <[email protected]> Co-authored-by: lhoet-google <[email protected]> Co-authored-by: Jonathan Buttner <[email protected]> Co-authored-by: Carlos Delgado <[email protected]> Co-authored-by: Jan Kuipers <[email protected]> Co-authored-by: Tim Grein <[email protected]> Co-authored-by: Ievgen Degtiarenko <[email protected]> Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Ignacio Vera <[email protected]> Co-authored-by: Mike Pellegrini <[email protected]> Co-authored-by: Moritz Mack <[email protected]> Co-authored-by: Ying Mao <[email protected]> Co-authored-by: Ioana Tagirta <[email protected]> Co-authored-by: Aurélien FOUCRET <[email protected]>

mridula-s109 and others added 30 commits June 6, 2025 11:37

propgating retrievers to inner retrievers

12fb2fa

test feature taken care of

81e99b6

Merge branch 'elastic:main' into main

05fb0ab

Small changes in concurrent multipart upload interfaces (elastic#128977)

605c035

Small changes in BlobContainer interface and wrapper. Relates ES-11815

Make PhaseCacheManagementTests project-aware (elastic#129047)

aec1688

The functionality in `PhaseCacheManagement` was already project-aware, but these tests were still using deprecated methods.

ES|QL: refactor generative tests (elastic#129028)

df3ef0d

Add a test of LOOKUP JOIN against a time series index (elastic#129007)

0eebc8c

Add a spec test of `LOOKUP JOIN` against a time series index.

Make ILM ClusterStateWaitStep project-aware (elastic#129042)

b1e15f0

This is part of an iterative process to make ILM project-aware.

Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT tes…

846b09a

…t {lookup-join.LookupJoinOnTimeSeriesIndex ASYNC} elastic#129078

Remove ClusterState param from ILM AsyncBranchingStep (elastic#12…

a97d582

…9076) The `ClusterState` parameter of the `asyncPredicate` is not used anywhere.

Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT tes…

763b502

…t {lookup-join.LookupJoinOnTimeSeriesIndex SYNC} elastic#129082

Mute org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT t…

8a660c8

…est {p0=upgraded_cluster/70_ilm/Test Lifecycle Still There And Indices Are Still Managed} elastic#129097

Mute org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT t…

aa16175

…est {p0=upgraded_cluster/90_ml_data_frame_analytics_crud/Get mixed cluster outlier_detection job} elastic#129098

Mute org.elasticsearch.packaging.test.DockerTests test081SymlinksAreF…

6e58b1e

…ollowedWithEnvironmentVariableFiles elastic#128867

Remove direct minScore propagation to inner retrievers

0776562

cleaned up skip

f145d26

Mute org.elasticsearch.index.engine.ThreadPoolMergeExecutorServiceDis…

d8b6897

…kSpaceTests testAvailableDiskSpaceMonitorWhenFileSystemStatErrors elastic#129149

Correct index path validation (elastic#129144)

eca383d

All we care about is if reindex is true or false. We shouldn't worry about force merge. Because if reindex is true, we will create the directory, if its false, we won't.

Mute org.elasticsearch.index.engine.ThreadPoolMergeExecutorServiceDis…

fb6ec9a

…kSpaceTests testUnavailableBudgetBlocksNewMergeTasksFromStartingExecution elastic#129148

Merge remote-tracking branch 'upstream/main'

0ef36a1

Merge remote-tracking branch 'upstream/main'

ece13d9

Fixing minscore filtering in the text similarity reranker

2a7fb18

mridula-s109 requested a review from pmpailis June 12, 2025 11:24

[CI] Auto commit changes from spotless

4bbf087

mridula-s109 requested review from Mikep86, kderusso and a team June 12, 2025 11:24

Copilot AI reviewed Jun 12, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/search/retriever/RankDocsRetrieverBuilder.java Show resolved Hide resolved

mridula-s109 marked this pull request as ready for review June 12, 2025 11:26

kderusso reviewed Jun 12, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/search/retriever/RankDocsRetrieverBuilder.java Outdated Show resolved Hide resolved

mridula-s109 added 2 commits June 12, 2025 15:29

Removed local variable from RankDocsRetrieverBuilder

0ff2bed

Merge branch 'main' into SEARCH-1006-text-similarity-reranker-does-no…

ff01899

…t-propagate-min-score-correctly

mridula-s109 requested a review from kderusso June 12, 2025 14:31

kderusso approved these changes Jun 12, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/search/retriever/RankDocsRetrieverBuilder.java Outdated Show resolved Hide resolved

mridula-s109 added 2 commits June 12, 2025 16:24

Update RankDocsRetrieverBuilder.java

8acd715

Merge branch 'main' into SEARCH-1006-text-similarity-reranker-does-no…

acd676e

…t-propagate-min-score-correctly

mridula-s109 enabled auto-merge (squash) June 12, 2025 15:25

mridula-s109 merged commit 03ba5b1 into elastic:main Jun 12, 2025
17 of 18 checks passed

mridula-s109 deleted the SEARCH-1006-text-similarity-reranker-does-not-propagate-min-score-correctly branch June 12, 2025 16:38

elasticsearchmachine added the backport pending label Jun 12, 2025

mridula-s109 mentioned this pull request Jun 12, 2025

Add min score linear retriever #129359

Merged

mridula-s109 added the v9.0.0 label Jun 16, 2025

This was referenced Jun 19, 2025

[9.0] Backporting fix minscore propagation in text similarity reranker #129700

Merged

[8.18] Backporting fix minscore propagation in text similarity reranker #129701

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix minscore propagation in text similarity reranker #129223

Fix minscore propagation in text similarity reranker #129223

Uh oh!

mridula-s109 commented Jun 10, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

elasticsearchmachine commented Jun 12, 2025

Uh oh!

Uh oh!

kderusso left a comment

Uh oh!

Uh oh!

Uh oh!

kderusso commented Jun 12, 2025

Uh oh!

Mikep86 commented Jun 16, 2025

Uh oh!

mridula-s109 commented Jun 17, 2025

Uh oh!

kderusso commented Jun 17, 2025

Uh oh!

mridula-s109 commented Jun 17, 2025

Uh oh!

Uh oh!

Fix minscore propagation in text similarity reranker #129223

Fix minscore propagation in text similarity reranker #129223

Uh oh!

Conversation

mridula-s109 commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Bug Fix Verification

Test Results

All Tests Passing Successfully ✅

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

elasticsearchmachine commented Jun 12, 2025

Uh oh!

Uh oh!

kderusso left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kderusso commented Jun 12, 2025

Uh oh!

Mikep86 commented Jun 16, 2025

Uh oh!

mridula-s109 commented Jun 17, 2025

Uh oh!

kderusso commented Jun 17, 2025

Uh oh!

mridula-s109 commented Jun 17, 2025

Uh oh!

Uh oh!

mridula-s109 commented Jun 10, 2025 •

edited

Loading