Skip to content

Keep ingester without Kafka#6870

Closed
pavolloffay wants to merge 1 commit intografana:mainfrom
pavolloffay:no-kafka
Closed

Keep ingester without Kafka#6870
pavolloffay wants to merge 1 commit intografana:mainfrom
pavolloffay:no-kafka

Conversation

@pavolloffay
Copy link
Copy Markdown
Contributor

What this PR does:

This PR brings back the ingester-based write path, allowing Tempo 3.0 to be deployed without Kafka. This enables two deployment models:

  1. Kafka mode (default for distributed): Distributor -> Kafka -> BlockBuilder/LiveStore -> Object Storage
  2. Kafkaless mode: Distributor -> Ingester (WAL) -> Object Storage

Motivation

  • Not all deployments require Kafka's complexity and operational overhead
  • Smaller installations benefit from the simpler ingester-based architecture
  • Provides a migration path for existing Tempo 2.x users
  • Enables local development and testing without Kafka dependencies

Alternatives

An alternative could be to make the Tempo architecture pluggable to allow maintaining the ingester separately (e.g. in some Grafana contrib repository?).

Which issue(s) this PR fixes:
Related to #6073
Related to #4077

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reintroduces an ingester-based write path to support “kafkaless” Tempo deployments (Distributor → Ingester/WAL → object storage) while keeping the existing Kafka-based architecture for distributed setups.

Changes:

  • Add an ingester write path in the distributor (ring-based fanout using PushBytesV2) and supporting discard/error accounting.
  • Allow the querier to query a non-partition ring directly when the partition ring is not configured (kafkaless mode).
  • Update app wiring/config to introduce ingester module/ring usage, tweak single-binary defaults, and add runtime override validation + tests.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
modules/querier/querier.go Adds a kafkaless fallback path that queries a plain ring when partition ring is absent.
modules/distributor/receiver/shim.go Updates help text for distributor_push_duration_seconds.
modules/distributor/distributor.go Implements ingester write path via ring + pool, adds new metrics, discard accounting, and adjusts Kafka timeout usage.
modules/distributor/distributor_test.go Adds tests and test scaffolding for ingester write path/discard logging; updates helpers to return ingester mocks.
modules/distributor/config.go Adds new config toggles for write path and write-extension behavior.
cmd/tempo/main.go Forces ingester ring defaults in single-binary mode.
cmd/tempo/app/overrides_validation.go Adds validation tying tenant shard size to ingester replication factor.
cmd/tempo/app/overrides_validation_test.go Adds unit tests for the new runtime override validation.
cmd/tempo/app/modules.go Wires in ingester module + ingester ring; changes distributor/querier init to defer until rings are available; adjusts single-binary composition.
cmd/tempo/app/config.go Adds ingester + ingester client config to top-level config; updates warning text for complete block timeout.
cmd/tempo/app/config_test.go Updates config warning test data after warning source moved from live-store to ingester.
cmd/tempo/app/app.go Adds ingester readiness check to /ready.

Comment on lines +377 to +416
// forReplicationSet runs f for all instances in a single replication set (kafkaless mode).
func forReplicationSet[R any, TClient any](ctx context.Context, q *Querier, rs ring.ReplicationSet, f func(context.Context, TClient) (R, error)) ([]R, error) {
var results []R
var mu sync.Mutex
var wg sync.WaitGroup
var firstErr error

for _, instance := range rs.Instances {
instance := instance
wg.Add(1)
go func() {
defer wg.Done()
client, err := q.liveStorePool.GetClientForInstance(instance)
if err != nil {
mu.Lock()
if firstErr == nil {
firstErr = err
}
mu.Unlock()
return
}
result, err := f(ctx, client.(TClient))
if err != nil {
mu.Lock()
if firstErr == nil {
firstErr = err
}
mu.Unlock()
return
}
mu.Lock()
results = append(results, result)
mu.Unlock()
}()
}
wg.Wait()

if firstErr != nil && len(results) == 0 {
return nil, firstErr
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In kafkaless mode this queries all instances in the replication set concurrently and then drops errors as long as at least one result succeeded. That’s a different correctness/perf model than the partition-ring path (which uses ring.DoUntilQuorum + minimize/hedging). Could we reuse ring.DoUntilQuorum (or concurrency.ForEachJobMergeResults over a single set) here so we stop at quorum and return errors consistently?

Suggested change
// forReplicationSet runs f for all instances in a single replication set (kafkaless mode).
func forReplicationSet[R any, TClient any](ctx context.Context, q *Querier, rs ring.ReplicationSet, f func(context.Context, TClient) (R, error)) ([]R, error) {
var results []R
var mu sync.Mutex
var wg sync.WaitGroup
var firstErr error
for _, instance := range rs.Instances {
instance := instance
wg.Add(1)
go func() {
defer wg.Done()
client, err := q.liveStorePool.GetClientForInstance(instance)
if err != nil {
mu.Lock()
if firstErr == nil {
firstErr = err
}
mu.Unlock()
return
}
result, err := f(ctx, client.(TClient))
if err != nil {
mu.Lock()
if firstErr == nil {
firstErr = err
}
mu.Unlock()
return
}
mu.Lock()
results = append(results, result)
mu.Unlock()
}()
}
wg.Wait()
if firstErr != nil && len(results) == 0 {
return nil, firstErr
}
// forReplicationSet runs f for a single replication set (kafkaless mode) and returns
// once the replication-set quorum has been satisfied. If quorum cannot be reached,
// it returns the aggregated errors from the failed replicas.
func forReplicationSet[R any, TClient any](ctx context.Context, q *Querier, rs ring.ReplicationSet, f func(context.Context, TClient) (R, error)) ([]R, error) {
if len(rs.Instances) == 0 {
return nil, nil
}
requiredResults := len(rs.Instances) - rs.MaxErrors
if requiredResults < 1 {
requiredResults = 1
}
ctx, cancel := context.WithCancel(ctx)
defer cancel()
results := make([]R, 0, requiredResults)
var mu sync.Mutex
var wg sync.WaitGroup
var errs error
for _, instance := range rs.Instances {
instance := instance
wg.Add(1)
go func() {
defer wg.Done()
client, err := q.liveStorePool.GetClientForInstance(instance)
if err != nil {
mu.Lock()
errs = multierr.Append(errs, err)
mu.Unlock()
return
}
result, err := f(ctx, client.(TClient))
if err != nil {
mu.Lock()
quorumReached := len(results) >= requiredResults
if !(quorumReached && (errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded))) {
errs = multierr.Append(errs, err)
}
mu.Unlock()
return
}
mu.Lock()
results = append(results, result)
if len(results) >= requiredResults {
cancel()
}
mu.Unlock()
}()
}
wg.Wait()
if len(results) < requiredResults {
if errs != nil {
return nil, errs
}
return nil, fmt.Errorf("failed to reach replication-set quorum: got %d/%d successful results", len(results), requiredResults)
}

Copilot uses AI. Check for mistakes.
Namespace: "tempo",
Name: "distributor_push_duration_seconds",
Help: "Records the amount of time to process and route a batch through the distributor.",
Help: "Records the amount of time to push a batch to the ingester.",
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metric is emitted for all distributor pushes (Kafka, local live-store, ingester write path). Updating the help text to "push a batch to the ingester" seems inaccurate/misleading in Kafka mode; consider making it path-agnostic again (e.g., "process and route"), or introducing a separate metric for ingester pushes.

Suggested change
Help: "Records the amount of time to push a batch to the ingester.",
Help: "Records the amount of time to process and route a pushed batch.",

Copilot uses AI. Check for mistakes.
Comment on lines +47 to +48
// KafkaWritePathEnabled enables the Kafka write path for Kafka-based deployments.
KafkaWritePathEnabled bool `yaml:"kafka_write_path_enabled"`
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KafkaWritePathEnabled is a new YAML-exposed config field but it isn't referenced anywhere in the codebase (and no flag is registered). Shipping an unused user-facing knob is likely to confuse operators; could we either wire it into the write-path selection logic or remove it before release?

Suggested change
// KafkaWritePathEnabled enables the Kafka write path for Kafka-based deployments.
KafkaWritePathEnabled bool `yaml:"kafka_write_path_enabled"`
// KafkaWritePathEnabled is reserved for internal use until it is wired into write-path selection.
// Keep this out of YAML so we don't ship an operator-facing knob that has no effect.
KafkaWritePathEnabled bool `yaml:"-"`

Copilot uses AI. Check for mistakes.
}
return ring.DoBatchWithOptions(ctx, ring.Write, ring.NewActivePartitionBatchRing(partitionRing), keys, func(partition ring.InstanceDesc, indexes []int) error {
localCtx, cancel := context.WithTimeout(ctx, d.cfg.KafkaConfig.WriteTimeout)
localCtx, cancel := context.WithTimeout(ctx, d.clientCfg.RemoteTimeout)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the Kafka produce timeout from KafkaConfig.WriteTimeout to IngesterClient.RemoteTimeout. Those are unrelated knobs (gRPC client timeout vs Kafka write timeout), and it likely makes Kafka latency tuning confusing. Should this stay on the Kafka config timeout instead?

Suggested change
localCtx, cancel := context.WithTimeout(ctx, d.clientCfg.RemoteTimeout)
localCtx, cancel := context.WithTimeout(ctx, d.cfg.KafkaConfig.WriteTimeout)

Copilot uses AI. Check for mistakes.
Comment on lines +971 to +977
if len(pushResponse.ErrorsByTrace) == 0 {
for _, reqBatchIndex := range indexes {
if reqBatchIndex > numOfTraces {
level.Warn(d.logger).Log("msg", fmt.Sprintf("batch index %d out of bound for length %d", reqBatchIndex, numOfTraces))
continue
}
numSuccessByTraceIndex[reqBatchIndex]++
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the no-error response path, this bounds check should be >= numOfTraces (not >) to avoid a panic when reqBatchIndex == numOfTraces (slice is 0..numOfTraces-1).

Copilot uses AI. Check for mistakes.
if config.Ingestion.TenantShardSize != 0 {
ingesterReplicationFactor := r.cfg.Ingester.LifecyclerConfig.RingConfig.ReplicationFactor
if config.Ingestion.TenantShardSize < ingesterReplicationFactor {
return warnings, fmt.Errorf("ingester.tenant.shard_size is lower than replication factor (%d < %d)", config.Ingestion.TenantShardSize, ingesterReplicationFactor)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a hard validation error for per-tenant runtime overrides. Per Tempo’s fail-open approach for multi-tenant overrides, a bad tenant override shouldn’t prevent Tempo from starting (or loading other tenants’ overrides). Could this be downgraded to a warning (or clamp/ignore the value) instead of returning an error?

Suggested change
return warnings, fmt.Errorf("ingester.tenant.shard_size is lower than replication factor (%d < %d)", config.Ingestion.TenantShardSize, ingesterReplicationFactor)
warnings = append(warnings, fmt.Errorf("ingester.tenant.shard_size is lower than replication factor (%d < %d)", config.Ingestion.TenantShardSize, ingesterReplicationFactor))

Copilot uses AI. Check for mistakes.
Comment thread cmd/tempo/app/modules.go
Comment on lines +349 to +357
if err := d.StartAsync(ctx); err != nil {
return err
}
return d.AwaitRunning(ctx)
},
// running
func(ctx context.Context) error {
<-ctx.Done()
return nil
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These wrappers start a child service (ingesterRing/distributor/querier) but the running function only waits on ctx.Done(), so if the child service fails/stops unexpectedly the module manager won’t observe it. Could we also select on the child service’s failure channel (or otherwise propagate failures) so the process can fail fast and readiness reflects reality? (Same pattern appears in initIngesterRing and initQuerier.)

Copilot uses AI. Check for mistakes.
Comment thread cmd/tempo/app/modules.go
Comment on lines +523 to +533
// Check if Kafka is configured to determine which ring to use
var liveStoreRing ring.ReadRing
kafkaConfigured := t.cfg.Ingest.Kafka.Topic != ""
if kafkaConfigured && t.readRings[ringLiveStore] != nil {
liveStoreRing = t.readRings[ringLiveStore]
} else {
liveStoreRing = t.readRings[ringIngester]
}
if liveStoreRing == nil {
return fmt.Errorf("live-store or ingester ring not initialized")
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Kafka mode, falling back to the ingester ring when the live-store ring isn’t initialized will make “recent” queries silently look at the wrong component and return incomplete results. If Kafka is configured, should this return an error unless the live-store ring is available (rather than falling back)?

Suggested change
// Check if Kafka is configured to determine which ring to use
var liveStoreRing ring.ReadRing
kafkaConfigured := t.cfg.Ingest.Kafka.Topic != ""
if kafkaConfigured && t.readRings[ringLiveStore] != nil {
liveStoreRing = t.readRings[ringLiveStore]
} else {
liveStoreRing = t.readRings[ringIngester]
}
if liveStoreRing == nil {
return fmt.Errorf("live-store or ingester ring not initialized")
}
var liveStoreRing ring.ReadRing
kafkaConfigured := t.cfg.Ingest.Kafka.Topic != ""
if kafkaConfigured {
liveStoreRing = t.readRings[ringLiveStore]
if liveStoreRing == nil {
return fmt.Errorf("live-store ring not initialized for Kafka mode")
}
} else {
liveStoreRing = t.readRings[ringIngester]
if liveStoreRing == nil {
return fmt.Errorf("ingester ring not initialized")
}
}

Copilot uses AI. Check for mistakes.
Comment thread cmd/tempo/app/modules.go

// composite targets
SingleBinary: {BackendScheduler, BackendWorker, QueryFrontend, Querier, Distributor, MetricsGenerator, BlockBuilder, LiveStore},
SingleBinary: {QueryFrontend, Querier, Ingester, Distributor, MetricsGenerator},
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-target=all no longer includes backend-scheduler/backend-worker/block-builder/live-store, but other code paths (and comments above) assume backend-worker will handle store polling in single-binary mode. As-is, single-binary may stop polling blocklists entirely, breaking search/sharding behavior. Should single-binary keep backend-worker (and likely scheduler) or explicitly enable polling elsewhere when running as all?

Suggested change
SingleBinary: {QueryFrontend, Querier, Ingester, Distributor, MetricsGenerator},
// Keep backend scheduler/worker in single-binary mode so `-target=all` continues to
// run the backend polling path used for blocklist updates/search sharding.
SingleBinary: {QueryFrontend, Querier, Ingester, Distributor, MetricsGenerator, BackendScheduler, BackendWorker},

Copilot uses AI. Check for mistakes.
Comment thread cmd/tempo/app/modules.go
Comment on lines 290 to +299
t.cfg.Distributor.KafkaConfig = t.cfg.Ingest.Kafka
t.cfg.Distributor.PushSpansToKafka = true
// Enable Kafka write path when Kafka is configured (topic is set) and we're not in single binary mode
kafkaConfigured := t.cfg.Ingest.Kafka.Topic != ""
if kafkaConfigured && !singleBinary {
t.cfg.Distributor.PushSpansToKafka = true
t.cfg.Distributor.IngesterWritePathEnabled = false
} else {
t.cfg.Distributor.PushSpansToKafka = false
t.cfg.Distributor.IngesterWritePathEnabled = true
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR reintroduces a user-facing deployment mode/config surface (kafkaless ingester write path), but there’s no CHANGELOG.md entry in the diff. Could we add a changelog entry so operators discover the new mode and any config changes?

Copilot uses AI. Check for mistakes.
@javiermolinar
Copy link
Copy Markdown
Contributor

javiermolinar commented Apr 8, 2026

Thank you for the PR @pavolloffay but we are commited to delete all the ingesters code.

Said so, we agree that Kafka as a requirement for the singlebinary mode is not sustainable. Because of that we have been working towards this kafkaless mode. See #6729 and #6618

The only missing part is flushing to the backend storage that will come next. We want to complete all that work before the official release of 3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants