Skip to content

Sort And Relevance Cuffoff Part 3 - Add Relevance Cutoff Feature #1246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 89 commits into
base: mainline
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
ea647b0
Finish API implementation
wanliAlex Jun 5, 2025
c898a8a
Finish unit tests API for sort
wanliAlex Jun 6, 2025
c4c7e8d
unit tests for query generation
wanliAlex Jun 10, 2025
22c732f
Add single field tests for java
wanliAlex Jun 11, 2025
2814ed7
Adding integ tests now
wanliAlex Jun 13, 2025
d044f6a
Need to solve inconsistent results issue
wanliAlex Jun 15, 2025
126c8e1
Add more tests
wanliAlex Jun 15, 2025
2a884c3
Add tests for 3 fields
wanliAlex Jun 16, 2025
14dc7e4
Add tests for 3 fields
wanliAlex Jun 16, 2025
3f155ea
Add tests for 3 fields
wanliAlex Jun 16, 2025
2340ba0
Fix schemas
wanliAlex Jun 16, 2025
fb5f0de
Merge branch 'mainline' into li/sort-and-relevance-cuffoff-python-api
wanliAlex Jun 16, 2025
9ffc9d3
Revert the vespa change
wanliAlex Jun 16, 2025
26350ec
delete the integration tests for sort by
wanliAlex Jun 16, 2025
c635205
Fix tests
wanliAlex Jun 16, 2025
5c690c4
Fix tests
wanliAlex Jun 16, 2025
fc2acf4
Add tests for sort
wanliAlex Jun 16, 2025
d1fc064
Add tests for sort
wanliAlex Jun 16, 2025
38b27db
Add tests for sort
wanliAlex Jun 16, 2025
8d3f06b
Restore pre-216 schemas
wanliAlex Jun 17, 2025
9159206
Fix tests
wanliAlex Jun 17, 2025
4b6270a
Fix tests
wanliAlex Jun 17, 2025
1255318
Fix tests
wanliAlex Jun 17, 2025
b36264c
Fix tests
wanliAlex Jun 18, 2025
4a949e1
Fix tests
wanliAlex Jun 18, 2025
f0bc99c
Finish Vespa implementation and add tests
wanliAlex Jun 18, 2025
dd34142
Catch mainline
wanliAlex Jun 18, 2025
a606837
Add default tests
wanliAlex Jun 18, 2025
cde30f1
Fix tests
wanliAlex Jun 18, 2025
3bb47eb
Fix tests
wanliAlex Jun 18, 2025
4edd848
Fix tests
wanliAlex Jun 19, 2025
c1349d3
Fix java part
wanliAlex Jun 19, 2025
4e8e0b9
Add index type tests
wanliAlex Jun 19, 2025
df1eb9a
Fix java code
wanliAlex Jun 19, 2025
23e199b
Add tests for java
wanliAlex Jun 19, 2025
87a2653
Fix python tests
wanliAlex Jun 19, 2025
b5d15bd
Add tests for java
wanliAlex Jun 19, 2025
52189b3
Add java tests
wanliAlex Jun 19, 2025
1304c30
Fix typo
wanliAlex Jun 19, 2025
3fb7af1
Fix typo
wanliAlex Jun 19, 2025
0228f5f
Add API tests
wanliAlex Jun 19, 2025
2a7929f
Fix tests
wanliAlex Jun 19, 2025
6e9009c
Add fields to query
wanliAlex Jun 20, 2025
51dd511
Finish implementation, now start tests
wanliAlex Jun 23, 2025
1522981
Finish relevance cut-off tests in Java
wanliAlex Jun 23, 2025
946d2cc
Finish integraiton tests. Left work: verif these integration tests, a…
wanliAlex Jun 23, 2025
8fe9e03
Catch mainline
wanliAlex Jun 24, 2025
f320605
Fix existing tests
wanliAlex Jun 24, 2025
e9f1cc3
Need to work on the targetHits for tensor part
wanliAlex Jun 25, 2025
7321bf7
Add targethits
wanliAlex Jun 26, 2025
5916a7c
Use regular expression for targetHits replacement
wanliAlex Jun 26, 2025
c83de51
Fix relevance-cut off tests
wanliAlex Jun 26, 2025
952b3c7
Add tests for java code
wanliAlex Jun 26, 2025
b891e1a
Add more tests for relvance cut-off
wanliAlex Jun 26, 2025
cf7dd00
merge two regular expression into 1
wanliAlex Jun 27, 2025
07f3beb
Change error message for multiple targetHits
wanliAlex Jun 27, 2025
ad5fc9d
Add tests for lexical/tensor hybrid searchs
wanliAlex Jun 27, 2025
2e2930f
Remove accidental draft file
wanliAlex Jun 27, 2025
65e228f
Fix unit tests
wanliAlex Jun 27, 2025
4c90fa0
Merge branch 'mainline' into li/sort-and-relevance-cuffoff-part-3
wanliAlex Jun 27, 2025
62f1a66
Use enum for relevanceCutoff method
wanliAlex Jun 27, 2025
22779e2
Added api tests
wanliAlex Jun 27, 2025
150daaf
Added api tests
wanliAlex Jun 27, 2025
7b1b3d0
Add some fashion documents and queries
wanliAlex Jul 1, 2025
934a03f
Fix some tests
wanliAlex Jul 1, 2025
ed971f1
Merge remote-tracking branch 'origin/mainline' into li/sort-and-relev…
wanliAlex Jul 2, 2025
7d4cfc3
Responded to Python comments and added tests for unstructured indexes…
wanliAlex Jul 2, 2025
7326a82
Respond to Java comments
wanliAlex Jul 2, 2025
3dee50a
Fix all java tests
wanliAlex Jul 2, 2025
defe405
Fix tests
wanliAlex Jul 2, 2025
d469d52
Fix tests
wanliAlex Jul 2, 2025
546150a
Fix java tests
wanliAlex Jul 2, 2025
991e564
Merge branch 'mainline' into li/sort-and-relevance-cuffoff-part-3
wanliAlex Jul 3, 2025
21acba9
Fix vespa custom searcher typo
wanliAlex Jul 3, 2025
f96d8a5
Avoid calling private methods during unit tests
wanliAlex Jul 3, 2025
c8043dd
Rename PATTERN to DOC_ID_PATTERN
wanliAlex Jul 3, 2025
08b7e2f
Rename PATTERN to DOC_ID_PATTERN
wanliAlex Jul 3, 2025
83a58ce
Add docs strings for java methods
wanliAlex Jul 3, 2025
0658b07
Fix format issue
wanliAlex Jul 3, 2025
6135f43
Fix API tests
wanliAlex Jul 4, 2025
a059be5
Update integration tests so the we are more restrictive on the result…
wanliAlex Jul 4, 2025
331f080
Fix all python tests
wanliAlex Jul 4, 2025
a9a5bbc
Merge remote-tracking branch 'origin/mainline' into li/sort-and-relev…
wanliAlex Jul 4, 2025
a005409
Respond to java comments
wanliAlex Jul 4, 2025
8975cd1
Fix java tests
wanliAlex Jul 4, 2025
472d272
Add marqo__fields.* prefix to medadata from Vespa
wanliAlex Jul 7, 2025
3bdb29a
Merge branch 'mainline' into li/sort-and-relevance-cuffoff-part-3
wanliAlex Jul 7, 2025
1b2c5f4
Fix tests
wanliAlex Jul 7, 2025
137107e
Merge branch 'li/sort-and-relevance-cuffoff-part-3' of https://github…
wanliAlex Jul 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/marqo/core/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,10 @@
MARQO_STRUCTURED_HYBRID_SEARCH_MINIMUM_VERSION = semver.VersionInfo.parse('2.10.0')
MARQO_UNSTRUCTURED_HYBRID_SEARCH_MINIMUM_VERSION = semver.VersionInfo.parse('2.11.0')
MARQO_CUSTOM_VECTOR_NORMALIZATION_MINIMUM_VERSION = semver.VersionInfo.parse('2.13.0')
MARQO_SEMI_UNSTRUCTURED_INDEX_VERSION = semver.VersionInfo.parse('2.13.0')
MARQO_GLOBAL_SCORE_MODIFIERS_MINIMUM_VERSION = semver.VersionInfo.parse('2.15.0')
MARQO_RERANK_DEPTH_MINIMUM_VERSION = semver.VersionInfo.parse('2.15.0')
MARQO_SORT_BY_MINIMUM_VERSION = semver.VersionInfo.parse('2.21.0')
MARQO_SORT_BY_MINIMUM_VERSION = semver.VersionInfo.parse('2.22.0')
MARQO_LANGUAGE_MINIMUM_VERSION = semver.VersionInfo.parse('2.16.0')
MARQO_PARTIAL_UPDATE_MINIMUM_VERSION = semver.VersionInfo.parse('2.16.0')

Expand Down
2 changes: 1 addition & 1 deletion src/marqo/core/models/marqo_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -640,7 +640,7 @@ def index_supports_language(self) -> bool:
@property
def index_supports_sorty_by(self) -> bool:
"""
Check if the index supports sort by.
Check if the index supports sort by or relevance cutoff.
"""
return self._cache_or_get(
'index_supports_sort_by',
Expand Down
24 changes: 23 additions & 1 deletion src/marqo/core/search/hybrid_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@
from marqo.core import exceptions as core_exceptions
from marqo.core.models.facets_parameters import FacetsParameters
from marqo.core.models.hybrid_parameters import HybridParameters, RetrievalMethod, RankingMethod
from marqo.core.models.marqo_index import UnstructuredMarqoIndex, StructuredMarqoIndex, SemiStructuredMarqoIndex
from marqo.core.models.marqo_index import UnstructuredMarqoIndex, StructuredMarqoIndex, SemiStructuredMarqoIndex, \
IndexType
from marqo.core.models.marqo_query import MarqoHybridQuery
from marqo.core.semi_structured_vespa_index.semi_structured_vespa_index import SemiStructuredVespaIndex
from marqo.core.vespa_index.vespa_index import for_marqo_index as vespa_index_factory
Expand Down Expand Up @@ -152,6 +153,21 @@ def search(
"'hybridParameters.queryLexical' is provided"
)

if sort_by and (
marqo_index_version < constants.MARQO_SORT_BY_MINIMUM_VERSION or
not marqo_index.type == IndexType.SemiStructured
):
raise core_exceptions.UnsupportedFeatureError(
f"The 'sortBy' features is only supported for unstructured indexes created "
f"with Marqo version {constants.MARQO_SORT_BY_MINIMUM_VERSION} or later "
)

if (relevance_cutoff and not marqo_index.type == IndexType.SemiStructured):
# Legacy unstructured indexes and structured indexes do not support relevance cutoff
raise core_exceptions.UnsupportedFeatureError(
f"The 'relevanceCutoff' feature is only supported for unstructured indexes created "
f"with Marqo version {constants.MARQO_SEMI_UNSTRUCTURED_INDEX_VERSION} or later "
)

# Determine the text query prefix
text_query_prefix = marqo_index.model.get_text_query_prefix(text_query_prefix)
Expand Down Expand Up @@ -319,7 +335,13 @@ def search(
f"{total_results} results from Vespa."
)

# Collect metadata for sort by
if sort_by is not None:
gathered_results["_sortCandidates"] = responses.root.fields.sort_candidates

# Collect metadata for relevance cutoff
if relevance_cutoff is not None:
gathered_results["_relevantCandidates"] = responses.root.fields.relevant_candidates
gathered_results["_probeCandidates"] = responses.root.fields.probe_candidates

return gathered_results
Original file line number Diff line number Diff line change
Expand Up @@ -665,7 +665,7 @@ def _to_vespa_hybrid_query(self, marqo_query: MarqoHybridQuery) -> Dict[str, Any
query["marqo__hybrid.relevanceCutoff.parameters.relativeScoreFactor"] = \
marqo_query.relevance_cutoff.parameters.relative_score_factor
elif marqo_query.relevance_cutoff.method == RelevanceCutoffMethod.MeanStdDev:
query["marqo__hybrid.relevanceCutoff.parameters.meanStdDevFactor"] = \
query["marqo__hybrid.relevanceCutoff.parameters.stdDevFactor"] = \
marqo_query.relevance_cutoff.parameters.std_dev_factor
else:
# No parameters for other methods
Expand All @@ -676,7 +676,7 @@ def _to_vespa_hybrid_query(self, marqo_query: MarqoHybridQuery) -> Dict[str, Any
if marqo_query.sort_by:
query["marqo__hybrid.sortBy.fields"] = [field.dict() for field in marqo_query.sort_by.fields]
query["marqo__hybrid.sortBy.sortDepth"] = marqo_query.sort_by.sort_depth
query["marqo__hybrid.sortBy.sortCandidates"] = marqo_query.sort_by.sort_candidates
query["marqo__hybrid.sortBy.minSortCandidates"] = marqo_query.sort_by.min_sort_candidates

query["query_features"]["marqo__sort_field_weights_0"] = {}
query["query_features"]["marqo__sort_field_weights_1"] = {}
Expand Down
23 changes: 12 additions & 11 deletions src/marqo/tensor_search/models/api_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -355,32 +355,33 @@ def _validate_sort_by_cannot_be_used_with_global_score_modifiers(cls, values):
return values

@root_validator(pre=False)
def _set_sort_by_sortCandidates_parameters(cls, values):
"""Set the value for sortCandidates in sortBy if it is not provided.
def _validate_and_set_sort_by_min_sort_candidates_parameters(cls, values):
"""validate the value for min_sort_candidates in sortBy.
If it is not provided and relevanceCutoff is None, this function will set it to a default value.

Logics:
- If relevanceCutoff is provided, do not set sortCandidates, otherwise:
- If sortBy.sortCandidates is None, set it to the maximum of:
- If relevanceCutoff is provided, do not set min_sort_candidates, otherwise:
- If sortBy.min_sort_candidates is None, set it to the maximum of:
- _DEFAULT_SORT_CANDIDATES_MULTIPLIER * limit
- offset + limit
- If sortBy.sortCandidates is provided, ensure it is at least as large as offset + limit.
- If sortBy.min_sort_candidates is provided, ensure it is at least as large as offset + limit.
"""
sort_by = values.get('sort_by')
relevance_cutoff = values.get('relevance_cutoff')
if sort_by is None or relevance_cutoff is not None:
return values

if sort_by.sort_candidates is None:
sort_by.sort_candidates = max(
if sort_by.min_sort_candidates is None:
sort_by.min_sort_candidates = max(
cls._DEFAULT_SORT_CANDIDATES_MULTIPLIER * values.get('limit'),
values.get('offset') + values.get('limit')
)
else:
# If sortCandidates is provided, ensure it is at least as large as offset + limit
if sort_by.sort_candidates < (values.get('offset') + values.get('limit')):
# If min_sort_candidates is provided, ensure it is at least as large as offset + limit
if sort_by.min_sort_candidates < (values.get('offset') + values.get('limit')):
raise ValueError(
f"sortCandidates must be at least as large as offset + limit. Received "
f" sortCandidates={sort_by.sort_candidates}, limit={values.get('limit')}, "
f" minSortCandidates must be at least as large as offset + limit. Received "
f" minSortCandidates={sort_by.min_sort_candidates}, limit={values.get('limit')}, "
f" offset={values.get('offset')} "
)
return values
Expand Down
8 changes: 4 additions & 4 deletions src/marqo/tensor_search/models/relevance_cutoff_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ class RelevanceCutoffMethod(str, Enum):


class RelativeMaxScoreParameters(StrictBaseModel):
relative_score_factor: float = Field(..., gt=0, le=1, alias="relativeScoreFactor")
relative_score_factor: float = Field(..., ge=0, le=1, alias="relativeScoreFactor")


class MeanStdParameters(StrictBaseModel):
std_dev_factor: float = Field(..., gt=0, alias="stdDevFactor")
std_dev_factor: float = Field(..., alias="stdDevFactor")


class RelevanceCutoffModel(StrictBaseModel):
Expand All @@ -26,11 +26,11 @@ class RelevanceCutoffModel(StrictBaseModel):
Attributes:
method (RelevanceCutoffMethod): The method to use for relevance cutoff.
probe_depth (int): The number of documents to probe for relevance cutoff. Defaults to 1000. We use
a lexical search as a probe search. Check Vespa Customer Searcher for more details.
a lexical search as a probe search. Check Vespa Custom Searcher for more details.
parameters (Union[RelativeMaxScoreParameters, MeanStdParameters]): The parameters for the relevance cutoff method.
If the method is RelativeMaxScore, you must provide 'relativeScoreFactor' as a parameter.
If the method is MeanStd, you must provide 'stdDevFactor' as a parameter.
Check Vespa Customer Searcher for more details.
Check Vespa Custom Searcher for more details.
"""
method: RelevanceCutoffMethod
probe_depth: int = Field(1000, ge=1, alias="probeDepth")
Expand Down
8 changes: 4 additions & 4 deletions src/marqo/tensor_search/models/sort_by_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,10 @@ class SortByModel(StrictBaseModel):
Note that the order of fields in this list determines the order of sorting. Fields presented later will
be used as tiebreakers for fields presented earlier.
sort_depth (Optional[int]): The depth of sorting at the global phase.
Check Vespa Customer Searcher for more details.
sort_candidates (Optional[int]): The minimum number of candidates to be retrieved.
Check Vespa Customer Searcher for more details.
Check Vespa Custom Searcher for more details.
min_sort_candidates (Optional[int]): The minimum number of candidates to be retrieved.
Check Vespa Custom Searcher for more details.
"""
fields: List[SortByField] = Field(..., min_items=1, max_items=3)
sort_depth: Optional[int] = Field(None, ge=1, alias="sortDepth")
sort_candidates: Optional[int] = Field(None, ge=1, alias="sortCandidates")
min_sort_candidates: Optional[int] = Field(None, ge=1, alias="minSortCandidates")
15 changes: 0 additions & 15 deletions src/marqo/tensor_search/tensor_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -422,21 +422,6 @@ def search(config: Config, index_name: str, text: Optional[Union[str, dict, Cust
f"{str(constants.MARQO_RERANK_DEPTH_MINIMUM_VERSION)} or later. "
f"This index was created with Marqo {marqo_index_version}."
)

if sort_by:
if not isinstance(marqo_index, SemiStructuredMarqoIndex):
raise core_exceptions.UnsupportedFeatureError(
f"The 'sortBy' feature is only supported for unstructured indexes created with Marqo version "
f"{constants.MARQO_SORT_BY_MINIMUM_VERSION} or later. "
f"Your index is either a structured index or an old unstructured index"
)
if not marqo_index.index_supports_sorty_by:
raise core_exceptions.UnsupportedFeatureError(
f"The 'sortBy' feature is only supported for unstructured indexes created with Marqo version "
f"{constants.MARQO_SORT_BY_MINIMUM_VERSION} or later. "
f"This unstructured index was created with Marqo {marqo_index_version} "
)

if search_method.upper() in {SearchMethod.TENSOR, SearchMethod.HYBRID}:
# Default approximate and efSearch -- we can't set these at API-level since they're not a valid args
# for lexical search
Expand Down
2 changes: 1 addition & 1 deletion src/marqo/version.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "2.21.0"
__version__ = "2.22.0"

def get_version() -> str:
return f"{__version__}"
4 changes: 3 additions & 1 deletion src/marqo/vespa/models/query_result.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@
# See https://docs.vespa.ai/en/reference/default-result-format.html
class RootFields(BaseModel):
total_count: Optional[int] = Field(None, alias='totalCount')
sort_candidates: Optional[int] = Field(None, alias='marqo__sortCandidates')
sort_candidates: Optional[int] = Field(None, alias='marqo__fields.sortCandidates')
relevant_candidates: Optional[int] = Field(None, alias='marqo__fields.relevantCandidates')
probe_candidates: Optional[int] = Field(None, alias='marqo__fields.probeCandidates')


class Degraded(BaseModel):
Expand Down
Loading
Loading