Description
Currently on version 2.19.X
What is the bug?
A clear and concise description of the bug.
Null Pointer Exception here:
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L324
This code seems to be gated if you have multiple shards, as this code seems to imply that if there are multiple shards fetchSearchResultOptional is empty and we have an early return:
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L282-L283
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L111-L114
Not exactly sure how the issue occurs but you can tell the docMap created from hit documents:
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L300-L308
Doesn't have the document that was scored:
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L320-L322
I am new to hybrid search, and am not running the code, but giving it a good reading this is my guess on what is going on:
- we create unProcessedDocs from the topDocs
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L359-L366 - We construct a map of docs, the key being from unProcessedDocs and the value being from the hit docs.
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L298C38-L308 - Similar to how we constructed unProcessedDocs, we go over the topDocs for each hit recorded. Importantly we don't get the docs from
docIds
but fromquerySearchResult
:
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L310-L324 - the unProccedDocs was created early on:
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L62-L65 - Before we call
updateOriginalFetchResults
we callupdateOriginalQueryResults
and inupdateOriginalQueryResults
we mutate the queryResults object with a different topDocs value
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L97C9-L103
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L212 - So now i think in
updateOriginalFetchResults
, we are coming back with with different topDocs that are used in the trimmedLengthOfSearchHits for loop than the unProcessedDocs ones we used to make the map.
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L310C47-L311
So maybe the fix is to not pass through unprocessedDocIds
and then update updateOriginalFetchResults
to just use the new re-ranked docs throughout the whole function? Assuming my diagnosis was/is correct. i am new to opensearch so not exactly sure how it is suppose to work.
How can one reproduce the bug?
Steps to reproduce the behavior.
Create an instance with a single shard, add 3 documents, use a hybrid search query that returns 2 documents. Sometimes it passes but most of the time i get the NPE. Hence why i mentioned it is non-determinsitic.
What is the expected behavior?
A clear and concise description of what you expected to happen.
Return the hit documents and not throw an NPE. We see this behavior when we used 3 shards.
What is your host/environment?
Operating system, version.
Opensearch 2.19. I believe it is aws opensearch. A teammate of mine actually owns the opensearch service so do not have all the details here.
Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.