-
Notifications
You must be signed in to change notification settings - Fork 210
Context Search With Documents (Personalization) - Revised #1254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
vicilliar
wants to merge
54
commits into
mainline
Choose a base branch
from
joshua/personalization-with-context
base: mainline
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updated version of personalization PR intended for 2.21.0 which was reverted due to an issue with the Vespa client.
Differences from old personalization PR:
-Use persistent SSL context instead of persistent transport, because it caused hanging during concurrent requests. Performance gains are comparable.
-Integration test added for concurrent get documents requests to ensure no hanging
Change Summary
Supports new key
documents
incontext
parameter of search. This allows for a dict of doc_id:weight pairs to be provided by user for vectors to be extracted and interpolated for tensor search or hybrid search with tensor retrieval or ranking.Updated example context object:
-Interpolation is now done on all vectors (query, context.tensor, context.document) instead of previous behavior of weighted average.
-Absolute value is now used when summing vector weights in interpolation (This makes ZeroSumWeightsError obsolete, we now use AllZeroWeightsError instead)
-Interpolation logic implemented in numpy.
-Fixed some error messages for hybrid / pydantic model validation.
New Env Vars:
-
MARQO_MAX_SEARCH_CONTEXT_DOCS
- aximum number of documents that can be used as context documents in a search request.The following optimizations have been implemented:
-Getting doc vectors fetches only essential embeddings fields from Vespa
-A persistent SSL Context is created upon vespa client initialization and is used for the
verify
param of every async client instantiation. This saves time spent creating new contexts.-Index is fetched from cache instead of Vespa
-Fetched doc vectors are not formatted into document form with Pydantic. Vectors are just used directly
Related Jira Ticket
https://s2search.atlassian.net/jira/polaris/projects/MOSD/ideas/view/4863474?selectedIssue=MOSD-23&issueViewLayout=sidebar&issueViewSection=overview&atlOrigin=eyJpIjoiNzcyNDBkNzVjM2JhNDE0Y2I4ODUzNDY3YTc2MmE0ZmUiLCJwIjoiaiJ9
Checklist
py-marqo changes: Support for context documents in search (Personalization) py-marqo#287
For new field types: