[BUG] Hybrid Search not working with Document Level Security

Hybrid Search queries fail when authenticating with an internal user that has Document Level Security enabled.

Steps to reproduce the behaviour:

_this is a rather contrived example that I have used for testing purposes_

create a role with a dls filter:

```
{"term": { "divisionId": "448385216"}}
```

authenticate with an internal user assigned to that role and then run a hybrid search query


```
GET my_test_index/_search?search_pipeline=hybrid-search-pipeline
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "bool": {
            "filter": [
              {
                "term": {
                  "organisationId": 1
                }
              }
            ]
          }
        },
        {
          "neural": {
            "label_embedding.paraphrase-multilingual-MiniLM-L12-v2": {
              "query_text": "document",
              "model_id": "Cf9VfZYBPcPsA6UT332K",
              "k": "5",
              "filter": {
                "term": {
                  "organisationId": 1
                }
              }
            }
          }
        }
      ]
    }
  },
  "_source": ["label", "organisationId", "esType", "status", "divisionId"]
}

```
returns error

```
{
  "error": {
    "root_cause": [
      {
        "type": "class_cast_exception",
        "reason": "class org.apache.lucene.search.ConstantScoreQuery cannot be cast to class org.opensearch.neuralsearch.query.HybridQuery (org.apache.lucene.search.ConstantScoreQuery is in unnamed module of loader 'app'; org.opensearch.neuralsearch.query.HybridQuery is in unnamed module of loader java.net.FactoryURLClassLoader @57435801)"
      },
      {
        "type": "illegal_argument_exception",
        "reason": "hybrid query must be a top level query and cannot be wrapped into other queries"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "my_test_index",
        "node": "j0xoYCCsTh2Jjoe_SNqmdQ",
        "reason": {
          "type": "class_cast_exception",
          "reason": "class org.apache.lucene.search.ConstantScoreQuery cannot be cast to class org.opensearch.neuralsearch.query.HybridQuery (org.apache.lucene.search.ConstantScoreQuery is in unnamed module of loader 'app'; org.opensearch.neuralsearch.query.HybridQuery is in unnamed module of loader java.net.FactoryURLClassLoader @57435801)"
        }
      },
      {
        "shard": 4,
        "index": "my_test_index",
        "node": "j0xoYCCsTh2Jjoe_SNqmdQ",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "hybrid query must be a top level query and cannot be wrapped into other queries"
        }
      }
    ],
    "caused_by": {
      "type": "class_cast_exception",
      "reason": "class org.apache.lucene.search.ConstantScoreQuery cannot be cast to class org.opensearch.neuralsearch.query.HybridQuery (org.apache.lucene.search.ConstantScoreQuery is in unnamed module of loader 'app'; org.opensearch.neuralsearch.query.HybridQuery is in unnamed module of loader java.net.FactoryURLClassLoader @57435801)",
      "caused_by": {
        "type": "class_cast_exception",
        "reason": "class org.apache.lucene.search.ConstantScoreQuery cannot be cast to class org.opensearch.neuralsearch.query.HybridQuery (org.apache.lucene.search.ConstantScoreQuery is in unnamed module of loader 'app'; org.opensearch.neuralsearch.query.HybridQuery is in unnamed module of loader java.net.FactoryURLClassLoader @57435801)"
      }
    }
  },
  "status": 500
}

```

Curiously, if I add a suggest block to the query I get the desired results (and the dsl filter is also being applied).

```
GET my_test_index/_search?search_pipeline=hybrid-search-pipeline
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "bool": {
            "filter": [
              {
                "term": {
                  "organisationId": 1
                }
              }
            ]
          }
        },
        {
          "neural": {
            "label_embedding.paraphrase-multilingual-MiniLM-L12-v2": {
              "query_text": "document",
              "model_id": "Cf9VfZYBPcPsA6UT332K",
              "k": "5",
              "filter": {
                "term": {
                  "organisationId": 1
                }
              }
            }
          }
        }
      ]
    }
  },
    "suggest": {
    "text": "document",
    "label-suggest": {
      "term": {
        "field": "label"
      }
    }
  },
  "_source": ["label", "organisationId", "esType", "status", "divisionId"]
}
```

returns results

```
{
  "took": 1176,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.2,
    "hits": [
      {
        "_index": "my_test_index",
        "_id": "1_325861482",
        "_score": 0.2,
        "_routing": "1",
        "_source": {
          "organisationId": 1,
          "divisionId": 448385216,
          "label": "English Document",
          "status": "active"
        }
      }
    ]
  },
  "suggest": {
    "label-suggest": [
      {
        "text": "document",
        "offset": 0,
        "length": 8,
        "options": []
      }
    ]
  }
}
```




I have tested this on our AWS Opensearch managed service which is on version 2.17.1 and also running 2.19.1 locally with the following docker configuration. (note I have a seperate script to register and deploy the embedding model, and create ingest and search pipelines etc).


```
version: "3.7"
services:
  opensearch:
    build: .
    container_name: opensearch
    environment:
      - discovery.type=single-node
      - plugins.security.disabled=false
      - plugins.security.ssl.http.enabled=false
      - plugins.security.ssl.transport.enforce_hostname_verification=false
      - OPENSEARCH_INITIAL_ADMIN_PASSWORD=somestrongpassword
      - OPENSEARCH_JAVA_OPTS=-Xms6g -Xmx6g
    ports:
      - "9200:9200" # OpenSearch HTTP
      - "9600:9600" # Performance Analyzer
    ulimits:
      memlock:
        soft: -1
        hard: -1
    mem_limit: 8g
    healthcheck:
      test: ["CMD-SHELL", "curl -s http://localhost:9200 || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 5

  dashboards:
    image: opensearchproject/opensearch-dashboards:2.19.1
    container_name: dashboards
    environment:
      - OPENSEARCH_HOSTS=["http://opensearch:9200"]
      - OPENSEARCH_USERNAME=admin
      - OPENSEARCH_PASSWORD= somestrongpassword
      - SERVER_SSL_ENABLED=false
    ports:
      - "5601:5601"
    depends_on:
      opensearch:
        condition: service_healthy
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Hybrid Search not working with Document Level Security #1303

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Hybrid Search not working with Document Level Security #1303

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions