Skip to content

Fix minscore propagation in text similarity reranker #129223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from 60 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
12fb2fa
propgating retrievers to inner retrievers
mridula-s109 Jun 2, 2025
81e99b6
test feature taken care of
mridula-s109 Jun 6, 2025
05fb0ab
Merge branch 'elastic:main' into main
mridula-s109 Jun 6, 2025
605c035
Small changes in concurrent multipart upload interfaces (#128977)
tlrx Jun 6, 2025
2dca633
Unmute FollowingEngineTests#testProcessOnceOnPrimary() test (#129054)
martijnvg Jun 6, 2025
4c0e3c9
[Build] Add support for publishing to maven central (#128659)
breskeby Jun 6, 2025
e2189e6
ESQL: Check for errors while loading blocks (#129016)
nik9000 Jun 6, 2025
aec1688
Make `PhaseCacheManagementTests` project-aware (#129047)
nielsbauman Jun 6, 2025
8c423ce
Vector test tools (#128934)
benwtrent Jun 6, 2025
df3ef0d
ES|QL: refactor generative tests (#129028)
luigidellaquila Jun 6, 2025
0eebc8c
Add a test of LOOKUP JOIN against a time series index (#129007)
bpintea Jun 6, 2025
b1e15f0
Make ILM `ClusterStateWaitStep` project-aware (#129042)
nielsbauman Jun 6, 2025
846b09a
Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT tes…
elasticsearchmachine Jun 6, 2025
a97d582
Remove `ClusterState` param from ILM `AsyncBranchingStep` (#129076)
nielsbauman Jun 6, 2025
763b502
Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT tes…
elasticsearchmachine Jun 6, 2025
8a660c8
Mute org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT t…
elasticsearchmachine Jun 6, 2025
aa16175
Mute org.elasticsearch.upgrades.UpgradeClusterClientYamlTestSuiteIT t…
elasticsearchmachine Jun 6, 2025
6e58b1e
Mute org.elasticsearch.packaging.test.DockerTests test081SymlinksAreF…
elasticsearchmachine Jun 7, 2025
05f70f0
Threadpool merge executor is aware of available disk space (#127613)
albertzaharovits Jun 8, 2025
713ab42
Add option to include or exclude vectors from _source retrieval (#128…
jimczi Jun 9, 2025
0776562
Remove direct minScore propagation to inner retrievers
mridula-s109 Jun 9, 2025
f145d26
cleaned up skip
mridula-s109 Jun 9, 2025
d8b6897
Mute org.elasticsearch.index.engine.ThreadPoolMergeExecutorServiceDis…
elasticsearchmachine Jun 9, 2025
82c7ab1
Add transport version for ML inference Mistral chat completion (#129033)
Jan-Kazlouski-elastic Jun 9, 2025
eca383d
Correct index path validation (#129144)
benwtrent Jun 9, 2025
fb6ec9a
Mute org.elasticsearch.index.engine.ThreadPoolMergeExecutorServiceDis…
elasticsearchmachine Jun 9, 2025
6806b24
Implemented completion task for Google VertexAI (#128694)
leo-hoet Jun 9, 2025
0ef36a1
Merge remote-tracking branch 'upstream/main'
mridula-s109 Jun 9, 2025
ece13d9
Merge remote-tracking branch 'upstream/main'
mridula-s109 Jun 9, 2025
2a7fb18
Fixing minscore filtering in the text similarity reranker
mridula-s109 Jun 9, 2025
36cd91e
Merge remote-tracking branch 'upstream/main'
mridula-s109 Jun 10, 2025
74b431d
ES|QL - kNN function initial support (#127322)
carlosdelest Jun 10, 2025
c678ebd
Remove optional seed from ES|QL SAMPLE (#128887)
jan-elastic Jun 10, 2025
7d37afa
[Inference API] Add "rerank" task type to "elastic" provider (#126022)
timgrein Jun 10, 2025
eed00f4
Rename target destination for microbenchmarks (#128878)
idegtiarenko Jun 10, 2025
f768664
Include direct memory and non-heap memory in ML memory calculations (…
jan-elastic Jun 10, 2025
2d605ee
Throw better exception for unsupported aggregations over shape fields…
iverase Jun 10, 2025
b68ddd1
Update Test Framework To Handle Query Rewrites That Rely on Non-Null …
Mikep86 Jun 10, 2025
f1bf18e
Update ReproduceInfoPrinter to correctly print a reproduction line fo…
mosche Jun 10, 2025
9abfe1d
Increment inference stats counter for shard bulk inference calls (#12…
jimczi Jun 10, 2025
2fa185a
Synthetic source: avoid storing multi fields of type text and match_o…
martijnvg Jun 10, 2025
ac213d5
Adding `scheduled_report_id` field to kibana reporting template (#127…
ymao1 Jun 10, 2025
01de61e
ES|QL: Add FORK generative tests (#129135)
ioanatia Jun 10, 2025
f48c383
ES|QL Completion command syntax change (#129189)
afoucret Jun 10, 2025
ecb9ac1
Merge remote-tracking branch 'origin/main' into SEARCH-1006-text-simi…
mridula-s109 Jun 10, 2025
e865ca7
Merge remote-tracking branch 'upstream/main' into SEARCH-1006-text-si…
mridula-s109 Jun 10, 2025
920c402
propagated minscore to rankdsocsretrieverbuilder
mridula-s109 Jun 10, 2025
18066d8
Merge remote-tracking branch 'upstream' into SEARCH-1006-text-similar…
mridula-s109 Jun 11, 2025
e5f30a2
Modified the file to include minscore and the test case to verify it
mridula-s109 Jun 12, 2025
e425094
Merge branch 'main' into SEARCH-1006-text-similarity-reranker-does-no…
mridula-s109 Jun 12, 2025
76e9165
Revert "Use IndexOrDocValuesQuery in NumberFieldType#termQuery implem…
iverase Jun 12, 2025
fbfe2c4
Merge branch 'main' into SEARCH-1006-text-similarity-reranker-does-no…
mridula-s109 Jun 12, 2025
14c7709
Fixed the rankdocsretriever builder
mridula-s109 Jun 12, 2025
5aded65
Merge branch 'main' into SEARCH-1006-text-similarity-reranker-does-no…
mridula-s109 Jun 12, 2025
ed743e5
Update docs/changelog/129223.yaml
mridula-s109 Jun 12, 2025
d189cbf
Update 129223.yaml
mridula-s109 Jun 12, 2025
db97b02
trying to introduce cluster featureS
mridula-s109 Jun 12, 2025
1928f7e
included cluster features in the test
mridula-s109 Jun 12, 2025
2eafa67
Fixed the merge issue
mridula-s109 Jun 12, 2025
2c02e80
Merge branch 'main' into SEARCH-1006-text-similarity-reranker-does-no…
mridula-s109 Jun 12, 2025
4bbf087
[CI] Auto commit changes from spotless
elasticsearchmachine Jun 12, 2025
0ff2bed
Removed local variable from RankDocsRetrieverBuilder
mridula-s109 Jun 12, 2025
ff01899
Merge branch 'main' into SEARCH-1006-text-similarity-reranker-does-no…
mridula-s109 Jun 12, 2025
8acd715
Update RankDocsRetrieverBuilder.java
mridula-s109 Jun 12, 2025
acd676e
Merge branch 'main' into SEARCH-1006-text-similarity-reranker-does-no…
mridula-s109 Jun 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/129223.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 129223
summary: Fix text similarity reranker does not propagate min score correctly
area: Search
type: bug
issues: []
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,8 @@ public void onFailure(Exception e) {
RankDocsRetrieverBuilder rankDocsRetrieverBuilder = new RankDocsRetrieverBuilder(
rankWindowSize,
newRetrievers.stream().map(s -> s.retriever).toList(),
results::get
results::get,
this.minScore
);
rankDocsRetrieverBuilder.retrieverName(retrieverName());
return rankDocsRetrieverBuilder;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,16 @@ public class RankDocsRetrieverBuilder extends RetrieverBuilder {
final int rankWindowSize;
final List<RetrieverBuilder> sources;
final Supplier<RankDoc[]> rankDocs;
final Float minScore;

public RankDocsRetrieverBuilder(int rankWindowSize, List<RetrieverBuilder> sources, Supplier<RankDoc[]> rankDocs) {
public RankDocsRetrieverBuilder(int rankWindowSize, List<RetrieverBuilder> sources, Supplier<RankDoc[]> rankDocs, Float minScore) {
this.rankWindowSize = rankWindowSize;
this.rankDocs = rankDocs;
if (sources == null || sources.isEmpty()) {
throw new IllegalArgumentException("sources must not be null or empty");
}
this.sources = sources;
this.minScore = minScore;
}

@Override
Expand All @@ -48,7 +50,7 @@ public String getName() {
}

private boolean sourceHasMinScore() {
return minScore != null || sources.stream().anyMatch(x -> x.minScore() != null);
return this.minScore != null || sources.stream().anyMatch(x -> x.minScore() != null);
}

private boolean sourceShouldRewrite(QueryRewriteContext ctx) throws IOException {
Expand Down Expand Up @@ -132,7 +134,7 @@ public void extractToSearchSourceBuilder(SearchSourceBuilder searchSourceBuilder
searchSourceBuilder.size(rankWindowSize);
}
if (sourceHasMinScore()) {
searchSourceBuilder.minScore(this.minScore() == null ? Float.MIN_VALUE : this.minScore());
searchSourceBuilder.minScore(this.minScore == null ? Float.MIN_VALUE : this.minScore);
}
if (searchSourceBuilder.size() + searchSourceBuilder.from() > rankDocResults.length) {
searchSourceBuilder.size(Math.max(0, rankDocResults.length - searchSourceBuilder.from()));
Expand Down Expand Up @@ -160,16 +162,21 @@ protected boolean doEquals(Object o) {
RankDocsRetrieverBuilder other = (RankDocsRetrieverBuilder) o;
return rankWindowSize == other.rankWindowSize
&& Arrays.equals(rankDocs.get(), other.rankDocs.get())
&& sources.equals(other.sources);
&& sources.equals(other.sources)
&& Objects.equals(minScore, other.minScore);
}

@Override
protected int doHashCode() {
return Objects.hash(super.hashCode(), rankWindowSize, Arrays.hashCode(rankDocs.get()), sources);
return Objects.hash(super.hashCode(), rankWindowSize, Arrays.hashCode(rankDocs.get()), sources, minScore);
}

@Override
protected void doToXContent(XContentBuilder builder, Params params) throws IOException {
throw new UnsupportedOperationException("toXContent() is not supported for " + this.getClass());
}

public Float minScore() {
return minScore;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ private List<QueryBuilder> preFilters(QueryRewriteContext queryRewriteContext) t
}

private RankDocsRetrieverBuilder createRandomRankDocsRetrieverBuilder(QueryRewriteContext queryRewriteContext) throws IOException {
return new RankDocsRetrieverBuilder(randomIntBetween(1, 100), innerRetrievers(queryRewriteContext), rankDocsSupplier());
return new RankDocsRetrieverBuilder(randomIntBetween(1, 100), innerRetrievers(queryRewriteContext), rankDocsSupplier(), null);
}

public void testExtractToSearchSourceBuilder() throws IOException {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ public Set<NodeFeature> getTestFeatures() {
SemanticInferenceMetadataFieldsMapper.EXPLICIT_NULL_FIXES,
SEMANTIC_KNN_VECTOR_QUERY_REWRITE_INTERCEPTION_SUPPORTED,
TextSimilarityRankRetrieverBuilder.TEXT_SIMILARITY_RERANKER_ALIAS_HANDLING_FIX,
TextSimilarityRankRetrieverBuilder.TEXT_SIMILARITY_RERANKER_MINSCORE_FIX,
SemanticInferenceMetadataFieldsMapper.INFERENCE_METADATA_FIELDS_ENABLED_BY_DEFAULT,
SEMANTIC_TEXT_HIGHLIGHTER_DEFAULT,
SEMANTIC_KNN_FILTER_FIX,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import org.elasticsearch.xcontent.XContentParser;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Objects;

Expand All @@ -39,6 +40,9 @@ public class TextSimilarityRankRetrieverBuilder extends CompoundRetrieverBuilder
public static final NodeFeature TEXT_SIMILARITY_RERANKER_ALIAS_HANDLING_FIX = new NodeFeature(
"text_similarity_reranker_alias_handling_fix"
);
public static final NodeFeature TEXT_SIMILARITY_RERANKER_MINSCORE_FIX = new NodeFeature(
"text_similarity_reranker_minscore_fix"
);

public static final ParseField RETRIEVER_FIELD = new ParseField("retriever");
public static final ParseField INFERENCE_ID_FIELD = new ParseField("inference_id");
Expand Down Expand Up @@ -157,23 +161,21 @@ protected TextSimilarityRankRetrieverBuilder clone(
protected RankDoc[] combineInnerRetrieverResults(List<ScoreDoc[]> rankResults, boolean explain) {
assert rankResults.size() == 1;
ScoreDoc[] scoreDocs = rankResults.getFirst();
TextSimilarityRankDoc[] textSimilarityRankDocs = new TextSimilarityRankDoc[scoreDocs.length];
List<TextSimilarityRankDoc> filteredDocs = new ArrayList<>();
// Filtering by min_score must be done here, after reranking.
// Applying min_score in the child retriever could prematurely exclude documents that would receive high scores from the reranker.
for (int i = 0; i < scoreDocs.length; i++) {
ScoreDoc scoreDoc = scoreDocs[i];
assert scoreDoc.score >= 0;
if (explain) {
textSimilarityRankDocs[i] = new TextSimilarityRankDoc(
scoreDoc.doc,
scoreDoc.score,
scoreDoc.shardIndex,
inferenceId,
field
);
} else {
textSimilarityRankDocs[i] = new TextSimilarityRankDoc(scoreDoc.doc, scoreDoc.score, scoreDoc.shardIndex);
if (minScore == null || scoreDoc.score >= minScore) {
if (explain) {
filteredDocs.add(new TextSimilarityRankDoc(scoreDoc.doc, scoreDoc.score, scoreDoc.shardIndex, inferenceId, field));
} else {
filteredDocs.add(new TextSimilarityRankDoc(scoreDoc.doc, scoreDoc.score, scoreDoc.shardIndex));
}
}
}
return textSimilarityRankDocs;
return filteredDocs.toArray(new TextSimilarityRankDoc[0]);
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -379,3 +379,111 @@ setup:
- match: { hits.total.value: 1 }
- length: { hits.hits: 1 }
- match: { hits.hits.0._id: "doc_1" }

---
"Text similarity reranker respects min_score":

- requires:
cluster_features: "text_similarity_reranker_minscore_fix"
reason: test min score functionality

- do:
index:
index: test-index
id: doc_2
body:
text: "The phases of the Moon come from the position of the Moon relative to the Earth and Sun."
topic: [ "science" ]
subtopic: [ "astronomy" ]
inference_text_field: "10"
refresh: true

- do:
search:
index: test-index
body:
track_total_hits: true
fields: [ "text", "topic" ]
retriever:
text_similarity_reranker:
retriever:
standard:
query:
bool:
should:
- constant_score:
filter:
term: { subtopic: "technology" }
boost: 10
- constant_score:
filter:
term: { subtopic: "astronomy" }
boost: 1
rank_window_size: 10
inference_id: my-rerank-model
inference_text: "How often does the moon hide the sun?"
field: inference_text_field
min_score: 10
size: 10

- match: { hits.total.value: 1 }
- length: { hits.hits: 1 }
- match: { hits.hits.0._id: "doc_2" }

---
"Text similarity reranker with min_score zero includes all docs":

- requires:
cluster_features: "text_similarity_reranker_minscore_fix"
reason: test min score functionality

- do:
search:
index: test-index
body:
track_total_hits: true
fields: [ "text", "topic" ]
retriever:
text_similarity_reranker:
retriever:
standard:
query:
match_all: {}
rank_window_size: 10
inference_id: my-rerank-model
inference_text: "How often does the moon hide the sun?"
field: inference_text_field
min_score: 0
size: 10

- match: { hits.total.value: 3 }
- length: { hits.hits: 3 }

---
"Text similarity reranker with high min_score excludes all docs":

- requires:
cluster_features: "text_similarity_reranker_minscore_fix"
reason: test min score functionality

- do:
search:
index: test-index
body:
track_total_hits: true
fields: [ "text", "topic" ]
retriever:
text_similarity_reranker:
retriever:
standard:
query:
match_all: {}
rank_window_size: 10
inference_id: my-rerank-model
inference_text: "How often does the moon hide the sun?"
field: inference_text_field
min_score: 1000
size: 10

- match: { hits.total.value: 0 }
- length: { hits.hits: 0 }
Loading