Skip to content

[Feature Request] Multi-threaded writes in pull-based ingestion #17875

Closed
@varunbharadwaj

Description

@varunbharadwaj

Is your feature request related to a problem? Please describe

The current pull-based ingestion design decouples poller and writer for better performance but uses a single writer thread. This can further be improved by supporting multi-threaded writes.

Describe the solution you'd like

Multi-threaded writes will be supported as follows:

  1. The internal blocking queue will be divided into partitions, one per writer thread.
  2. The poller will use the ID field to map the incoming message to an internal blocking queue partition. One blocking queue will be maintained per writer thread.
  3. The writer threads will consume from the respective in-memory queue and write to the index.

This solution guarantees that the updates for the same document are sequentially processed, in the same order visible to the consumer. Versioning will be supported when the underlying streaming source does not provide ordering guarantees.
Note that a message without an ID field will result in an auto-generated ID at runtime, and can be mapped to different partitions on retries. This however should not affect the correctness of the data as subsequent updates to the same document must provide the ID field.

Shard recovery will be handled in a multi-threaded write scenario as follows:

  1. Each processor/writer thread will track the current shard pointer that is being processed.
  2. Commits will include the minimum shard pointer across all writer threads indicating the batch start pointer.
  3. The poller will start polling from the batch start pointer persisted along with the commit on shard recovery.

Related component

Indexing

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    IndexingIndexing, Bulk Indexing and anything related to indexingenhancementEnhancement or improvement to existing feature or requestuntriaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions