Skip to content

[BUG] Hybrid Search with sort and rrf-pipline corrupts scores #1274

Closed
@kenmasumitsu

Description

@kenmasumitsu

Describe the bug

When I run a query with Hybrid Search and RFF pipeline by sorted by ID, the score is changed.

GET my-test/_search?search_pipeline=rrf-pipeline
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "content_text": "zoo"
          }
        },
        {
          "knn": {
            "content_vector": {
              "vector": [0.1, 0.2],
              "k": 10
            }
          }
        }
      ]
    }
  },
  "sort": [
    "_id"
  ]
}

Related component

Search

To Reproduce

  1. Create index
PUT my-test
{
  "settings": {
    "index": {
      "knn": true
    }
  },  
  "mappings": {
    "properties": {
      "content_text": {
        "type": "text"
      },
      "content_vector": {
        "type": "knn_vector",
        "dimension": 2,
        "method": {       
          "name": "hnsw",
          "space_type": "cosinesimil", 
          "engine": "nmslib"  
        }
      }
    }
  }
}
  1. Add data
POST my-test/_doc/1
{
   "content_text":"foo",
   "content_vector":[0.12, -0.34]
}

POST my-test/_doc/2
{
   "content_text":"bar",
   "content_vector":[0.13, 0.45]
}
  1. Put RRF pipeline
PUT /_search/pipeline/rrf-pipeline
{
  "description": "Post processor for hybrid RRF search",
  "phase_results_processors": [
    {
      "score-ranker-processor": {
        "combination": {
          "technique": "rrf"
        }
      }
    }
  ]
}
  1. Run hybrid search with sort option.
GET my-test/_search?search_pipeline=rrf-pipeline
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "content_text": "zoo"
          }
        },
        {
          "knn": {
            "content_vector": {
              "vector": [0.1, 0.2],
              "k": 10
            }
          }
        }
      ]
    }
  },
  "sort": [
    "_id"
  ]
}

The result is

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 0.016393442,
    "hits": [
      {
        "_index": "my-test",
        "_id": "1",
        "_score": 0.016393442,
        "sort": [
          "1"
        ]
      },
      {
        "_index": "my-test",
        "_id": "2",
        "_score": 0.016129032,
        "sort": [
          "2"
        ]
      }
    ]
  }
}

But if the query does not have "sort",

GET my-test/_search?search_pipeline=rrf-pipeline
{
  "_source": false, 
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "content_text": "zoo"
          }
        },
        {
          "knn": {
            "content_vector": {
              "vector": [0.1, 0.2],
              "k": 10
            }
          }
        }
      ]
    }
  }
}
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 0.016393442,
    "hits": [
      {
        "_index": "my-test",
        "_id": "2",
        "_score": 0.016393442
      },
      {
        "_index": "my-test",
        "_id": "1",
        "_score": 0.016129032
      }
    ]
  }
}

I expected the differences is only the order of the hits array.
But score was different.

  • id=1 : 0.016393442 -> 0.016129032
  • id=2 : 0.016129032 -> 0.016393442

Expected behavior

The scores should not be changes with sort option.

Additional Details

Plugins

7982a63674f5 analysis-sudachi                     3.3.0
7982a63674f5 opensearch-alerting                  2.19.1.0
7982a63674f5 opensearch-anomaly-detection         2.19.1.0
7982a63674f5 opensearch-asynchronous-search       2.19.1.0
7982a63674f5 opensearch-cross-cluster-replication 2.19.1.0
7982a63674f5 opensearch-custom-codecs             2.19.1.0
7982a63674f5 opensearch-flow-framework            2.19.1.0
7982a63674f5 opensearch-geospatial                2.19.1.0
7982a63674f5 opensearch-index-management          2.19.1.0
7982a63674f5 opensearch-job-scheduler             2.19.1.0
7982a63674f5 opensearch-knn                       2.19.1.0
7982a63674f5 opensearch-ltr                       2.19.1.0
7982a63674f5 opensearch-ml                        2.19.1.0
7982a63674f5 opensearch-neural-search             2.19.1.0
7982a63674f5 opensearch-notifications             2.19.1.0
7982a63674f5 opensearch-notifications-core        2.19.1.0
7982a63674f5 opensearch-observability             2.19.1.0
7982a63674f5 opensearch-performance-analyzer      2.19.1.0
7982a63674f5 opensearch-reports-scheduler         2.19.1.0
7982a63674f5 opensearch-security                  2.19.1.0
7982a63674f5 opensearch-security-analytics        2.19.1.0
7982a63674f5 opensearch-skills                    2.19.1.0
7982a63674f5 opensearch-sql                       2.19.1.0
7982a63674f5 opensearch-system-templates          2.19.1.0
7982a63674f5 query-insights                       2.19.1.0

Screenshots
No

Host/Environment (please complete the following information):

  • OS: Ubuntu 22.04
  • Version: OpenSearch 2.19.1

Additional context
No

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions