[BUG] [Hybrid Search] Non-determinsitic NullPointerException bug when using hybrid search with a single shard

Currently on version 2.19.X

### What is the bug?
_A clear and concise description of the bug._

Null Pointer Exception here:
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L324

This code seems to be gated if you have multiple shards, as this code seems to imply that if there are multiple shards fetchSearchResultOptional is empty and we have an early return:
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L282-L283
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L111-L114

Not exactly sure how the issue occurs but you can tell the docMap created from hit documents:
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L300-L308

Doesn't have the document that was scored:
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L320-L322

I am new to hybrid search, and am not running the code, but giving it a good reading this is my guess on what is going on:
1. we create unProcessedDocs from the topDocs
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L359-L366
2. We construct a map of docs, the key being from unProcessedDocs and the value being from the hit docs.
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L298C38-L308
3. Similar to how we constructed unProcessedDocs, we go over the topDocs for each hit recorded. Importantly we don't get the docs from `docIds` but from `querySearchResult`:
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L310-L324
4. the unProccedDocs was created early on:
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L62-L65
5. Before we call `updateOriginalFetchResults` we call `updateOriginalQueryResults` and in `updateOriginalQueryResults` we mutate the queryResults object with a different topDocs value
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L97C9-L103
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L212
6.  So now i think in `updateOriginalFetchResults`, we are coming back with with different topDocs that are used in the trimmedLengthOfSearchHits for loop than the unProcessedDocs ones we used to make the map.
https://github.com/opensearch-project/neural-search/blob/2.19.2.0/src/main/java/org/opensearch/neuralsearch/processor/NormalizationProcessorWorkflow.java#L310C47-L311

So maybe the fix is to not pass through `unprocessedDocIds` and then update `updateOriginalFetchResults` to just use the new re-ranked docs throughout the whole function? Assuming my diagnosis was/is correct.  i am new to opensearch so not exactly sure how it is suppose to work. 

 

### How can one reproduce the bug?
_Steps to reproduce the behavior._

Create an instance with a single shard, add 3 documents, use a hybrid search query that returns 2 documents. Sometimes it passes but most of the time i get the NPE. Hence why i mentioned it is non-determinsitic.

### What is the expected behavior?
_A clear and concise description of what you expected to happen._

Return the hit documents and not throw an NPE. We see this behavior when we used 3 shards.

### What is your host/environment?
_Operating system, version._

Opensearch 2.19. I believe it is aws opensearch. A teammate of mine actually owns the opensearch service so do not have all the details here.

### Do you have any screenshots?
_If applicable, add screenshots to help explain your problem._

<img width="1278" alt="Image" src="https://github.com/user-attachments/assets/3a8dfd47-7042-45da-8138-555bbfb1766f" />

### Do you have any additional context?
_Add any other context about the problem._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] [Hybrid Search] Non-determinsitic NullPointerException bug when using hybrid search with a single shard #1415

What is the bug?

How can one reproduce the bug?

What is the expected behavior?

What is your host/environment?

Do you have any screenshots?

Do you have any additional context?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] [Hybrid Search] Non-determinsitic NullPointerException bug when using hybrid search with a single shard #1415

Description

What is the bug?

How can one reproduce the bug?

What is the expected behavior?

What is your host/environment?

Do you have any screenshots?

Do you have any additional context?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions