Skip to content

Sort And Relevance Cuffoff Part 3 - Add Relevance Cutoff Feature #1246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 85 commits into
base: mainline
Choose a base branch
from

Conversation

wanliAlex
Copy link
Collaborator

@wanliAlex wanliAlex commented Jun 23, 2025

Change Summary

This PR implements the relevance-cutoff functionality in Marqo Python and Java. Relevance Cutoff can be used to stub out irrelevant documents in the retrieval stage. Detailed changes are shown as below:

Sorting Enhancements:

  • Renamed sortCandidates to minSortCandidates across models, validation logic, and query construction for better clarity. (src/marqo/tensor_search/models/sort_by_model.py,
  • Updated all related test cases to use minSortCandidates instead of sortCandidates.

Relevance Cutoff Implementation:

  • Implement 3 different relevance cutoff methods in Java, namely "relative_max_score", "gap_detection", and "mean_std_dev",

    1. Relative Max Score (relative_max_score)
      - Algorithm: Uses a dynamic threshold based on the top search result's score
      - Implementation:
      • Takes the highest scoring result's score
      • Multiplies it by a relativeScoreFactor parameter (0-1 range)
      • Keeps all results with scores ≥ this threshold
    2. Gap Detection (gap_detection)
      - Algorithm: Finds the largest score gap between consecutive results (elbow detection)
      - Implementation:
      • Iterates through sorted results comparing adjacent scores
      • Finds the position with the maximum score difference (delta)
      • Cuts off at the position after the largest gap
        - Parameter: None required (no cutoff parameter)
    3. Mean Standard Deviation (mean_std_dev)
      - Algorithm: Statistical approach using mean and standard deviation
      - Implementation:
      • Calculates mean score across all probe results
      • Calculates population standard deviation
      • Sets threshold as: mean + (stdDev × stdDevFactor)
      • Keeps results with scores ≥ threshold
  • New Response Metadata
    All relevance cutoff searches return additional metadata:

    • _relevantCandidates: Number of results passing relevance filter
    • _probeCandidates: Total candidates analyzed during probe phase
  • New API Parameter: relevanceCutoff

    API Structure

 {
   "relevanceCutoff": {
     "method": "relative_max_score|gap_detection|mean_std_dev",
     "parameters": {
       "relativeScoreFactor": 0.6,  # for relative_max_score
       "stdDevFactor": 0.3         # for mean_std_dev
     }
   }
 }
  • Example Usage
  # Just relevance cut-off
  response = client.index("my-index").search(
      q="machine learning AI",
      search_method="HYBRID",
      relevance_cutoff={
          "method": "relative_max_score",
          "parameters": {"relativeScoreFactor": 0.7}
      },
      limit=10
  )
  # Search with both relevance cut-off and sorting
  response = client.index("my-index").search(
      q="machine learning artificial intelligence",
      search_method="HYBRID",
      relevance_cutoff={
          "method": "relative_max_score",
          "parameters": {"relativeScoreFactor": 0.7}
      },
      sort_by={
          "fields": [
              {"fieldName": "sort_value", "order": "desc", "missing": "last"}
          ],
          "minSortCandidates": 10  # Optional: override relevance cutoff if needed
      },
  )

Related Jira Ticket

ticket link

Checklist

  • [*] Tests have been added for changes
  • [n/a] Documentation has been updated
  • [n/a] Breaking changes are clearly identified
  • [ n/a Python client changes linked or N/A

For new field types:

  • Tests cover score modifier usage of this new type
  • Test indexes updated to cover the new type for all APIs (add docs, search, partial update, etc.)

Comment on lines 84 to 85
self.assertIn("h1", result_ids)
self.assertIn("h2", result_ids)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about other records? do we expect them in the result? why don't we just assert the whole result_ids list?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we assert on the exact order of the response documents ids.

Comment on lines 211 to 217
self.assertNotIn("l1", ids_with_cutoff) # Should be filtered out
self.assertNotIn("l2", ids_with_cutoff) # Should be filtered out

# Should still include high relevance docs
self.assertIn("h1", ids_with_cutoff)
self.assertIn("h2", ids_with_cutoff)
self.assertIn("h3", ids_with_cutoff)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. simple way is to assert the ids_with_cutoff equals to the expected

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

{"_id": "l2", "content": "Bright morning sunlight streams through", "sort_value": 9.5},
# High sort, low relevance
]
self.client.index(self.structured_index_name).add_documents(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. no need to populate docs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


# Should include low relevance docs due to override
ids = [hit["_id"] for hit in response["hits"]]
self.assertIn("l1", ids) # Should be included due to minSortCandidates override
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we verify the order of the ids list instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

)
ids_no_cutoff = [hit["_id"] for hit in response_no_cutoff["hits"]]
# Should include low relevance docs with high sort values
self.assertIn("l1", ids_no_cutoff[:3]) # Should be in top 3 due to high sort value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't it the top 1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

if (relevance_cutoff and isinstance(marqo_index, UnstructuredMarqoIndex) and
not isinstance(marqo_index, SemiStructuredMarqoIndex)):
# Legacy unstructured indexes do not support relevance cutoff
raise core_exceptions.UnsupportedFeatureError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't erroring out for structured indexes due the second condition

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the condition. Relevance cut-off is only available to unstructured indexes now

@@ -7,6 +7,8 @@
class RootFields(BaseModel):
total_count: Optional[int] = Field(None, alias='totalCount')
sort_candidates: Optional[int] = Field(None, alias='marqo__sortCandidates')
relevant_candidates: Optional[int] = Field(None, alias='marqo__relevantCandidates')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the Vespa documentation have an opinion on this?



class TestSearchRelevanceCutoffFeatureSort1Field(MarqoTestCase):
class TestSearchRelevanceCutoffFeature(MarqoTestCase):
Copy link
Collaborator

@papa99do papa99do Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the search result (order and score of each doc) deterministic? what about with multi-shards and multi-replica?
Also, why do we use greaterThan and lessThan to verify expected candidates? Shouldn't we verify with a specific number if the result is deterministic?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the tests more restrictive

def tearDown(self):
"""Ensure documents are not changed after each test."""
for index_name in [self.unstructured_index_name, self.structured_index_name]:
if 30 !=self.monitoring.get_index_stats_by_name(index_name).number_of_documents:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this, why do we need to verify the doc number in setup and tearDown?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a pass for setUp

}

for _ in range(10):
# We run it several times to ensure that the results are consistent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to run multiple times? What could cause the result to be different?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

)

r = self.index.to_vespa_query(self.hybrid_query)
self.assertAlmostEqual(0.001,
Copy link
Collaborator

@papa99do papa99do Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.001 is not too small, will we encounter precision issue when use assertEqual? I think we should use assertEqual if it can pass.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use self.assertEqual

* @return The number of relevant results to keep
*/
@VisibleForTesting
Integer detectCutoffCount(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. return a int instead, so the caller will not need to test if the result is null

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 304 to 307
assertThat(result).contains("targetHits: 200");
assertThat(result).doesNotContain("targetHits: 100");
assertThat(result).contains("hnsw.exploreAdditionalHits: 1800");
assertThat(result).doesNotContain("hnsw.exploreAdditionalHits: 1900");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better way is to just assertThat(result).isEqualTo(expectedResult) to verify it's exactly what we want it to be. Same for the following test cases

Copy link
Collaborator Author

@wanliAlex wanliAlex Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

String yql3 = "{targetHits:100, hnsw.exploreAdditionalHits:900}"; // efSearch = 1000
String result3 = callOverwriteTargetHits(yql3, 1000);
assertThat(result3).contains("targetHits:1000");
assertThat(result3).contains("hnsw.exploreAdditionalHits:0"); // 1000 - 1000 = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test case that override is larger than efsearch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Collaborator

@papa99do papa99do Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. I feel like there are duplicate test cases in this tests, (might be auto generated by AI), maybe try ask AI to identify what test cases are duplicate and consolidate them a bit. Also consider using parameterised test to further reduce the size of the test class for easier reading.

}

@Test
void shouldUpdateHitsWhenSortByCandidatesLowerThanLimitOffset() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

@Test
void shouldHandleNullParametersCorrectly() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a duplicate test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

class IntegrationRelevanceCutoffAndSortByTest {

@Test
void shouldUseMaxCandidatesWhenBothFeaturesEnabledRelevanceHigher() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dup with shouldUpdateHitsWhenBothFeaturesEnabled?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

}

@Test
void shouldUseMaxCandidatesWhenBothFeaturesEnabledSortByHigher() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dup with shouldUpdateHitsWhenBothFeaturesEnabledReversed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

}

@Test
void shouldValidateTensorTargetHitsBeforeProcessingWithBothFeatures() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dup with shouldThrowExceptionWhenTensorTargetHitsLowerThanLimitOffset?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants