Description
Is your feature request related to a problem?
While doing the indexing all the fields which are getting ingested in Opensearch are stored as _source. If user requires they can disable the _source per field or completely for all the fields. But if user does this, the _recovery_source gets added(ref), which gets removed later on.
So overall the whole payload will still be used as a StoredField and impacts the indexing time. The impact on indexing time is high if one of the field is a vector field. In my experiments with 768D 1M dataset I can see a 50% reduction in indexing latency at p90 level.
What solution would you like?
Just like _source where we can specify what fields are included/excluded in _source or completely disable _source, I was thinking to have same capability for _recovery_source. This will ensure that users can remove their fields from _recovery source if required.
What alternatives have you considered?
NA
Do you have any additional context?
The capability needs to be added in Opensearch core. I have already created a GH issue there: opensearch-project/OpenSearch#13490
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status