Skip to content

[FEATURE][BuildTime Reduction V1] Add the feature to Remove _recovery_source for KNN field. #1719

Closed
@navneet1v

Description

@navneet1v

Is your feature request related to a problem?
While doing the indexing all the fields which are getting ingested in Opensearch are stored as _source. If user requires they can disable the _source per field or completely for all the fields. But if user does this, the _recovery_source gets added(ref), which gets removed later on.

So overall the whole payload will still be used as a StoredField and impacts the indexing time. The impact on indexing time is high if one of the field is a vector field. In my experiments with 768D 1M dataset I can see a 50% reduction in indexing latency at p90 level.

What solution would you like?
Just like _source where we can specify what fields are included/excluded in _source or completely disable _source, I was thinking to have same capability for _recovery_source. This will ensure that users can remove their fields from _recovery source if required.

What alternatives have you considered?
NA

Do you have any additional context?
The capability needs to be added in Opensearch core. I have already created a GH issue there: opensearch-project/OpenSearch#13490

Metadata

Metadata

Assignees

Labels

enhancementindexing-improvementsThis label should be attached to all the github issues which will help improving the indexing time.v2.15.0

Type

No type

Projects

Status

✅ Done

Status

2.15.0 (Release window opens on June 10th, 2024 and closes on June 25th, 2024)

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions