Skip to content

[metrics]: enable native histograms for Tempo's promauto-registered metrics#6910

Merged
zalegrala merged 4 commits intografana:mainfrom
zalegrala:nativeHistogramsAddition
Apr 9, 2026
Merged

[metrics]: enable native histograms for Tempo's promauto-registered metrics#6910
zalegrala merged 4 commits intografana:mainfrom
zalegrala:nativeHistogramsAddition

Conversation

@zalegrala
Copy link
Copy Markdown
Contributor

What this PR does:

Enables native (exponential) histogram emission alongside classic (fixed-bucket) histograms for all of Tempo's promauto-registered metrics.

Two changes:

  1. Adds NativeHistogramBucketFactor: 1.1, NativeHistogramMaxBucketNumber: 100, and NativeHistogramMinResetDuration: 1h to histogram registrations that were missing these fields across modules/frontend, modules/ingester, modules/livestore, modules/distributor, pkg/dataquality, and pkg/drain.

  2. Sets MetricsNativeHistogramFactor: 1.1 on the dskit server config in initServer(), which enables native histograms for tempo_request_duration_seconds — the primary HTTP/gRPC latency metric used in dashboards and latency SLOs.

When both classic buckets and NativeHistogramBucketFactor are set, the Prometheus client maintains both representations simultaneously. Scrapers using the classic text format continue to receive classic histograms unchanged. Scrapers that request OpenMetrics or protobuf format receive native histograms. This is fully backward-compatible — no existing queries or SLOs are affected by this change alone.

Note for future follow-up: if/when the Prometheus scrape config is updated to ingest native histograms (protobuf format), the _bucket-based expressions in the mixin dashboards and histogramRules in rules.libsonnet will need updating. That should be a separate PR tied to the scrape config change.

Which issue(s) this PR fixes:
N/A

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Add NativeHistogramBucketFactor=1.1 to all promauto-registered histograms
that were missing it, and set MetricsNativeHistogramFactor on the dskit
server config to enable native histograms for tempo_request_duration_seconds.

Histograms now emit both classic (fixed-bucket) and native (exponential)
formats simultaneously. Classic format continues to work unchanged for
existing scrapers; native format is available to scrapers that request it
via the OpenMetrics or protobuf Accept header.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 9, 2026 15:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables Prometheus native (exponential) histogram emission across Tempo’s promauto-registered histogram metrics, and turns on native histograms for the dskit server request-duration metric.

Changes:

  • Adds NativeHistogramBucketFactor, NativeHistogramMaxBucketNumber, and NativeHistogramMinResetDuration to multiple histogram registrations across modules/packages.
  • Sets Server.MetricsNativeHistogramFactor = 1.1 in initServer() to enable native histograms for the server request duration metric.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/drain/metrics.go Adds native histogram options to the drain tokens_per_line histogram.
pkg/dataquality/warnings.go Adds native histogram options to span “distance in past/future” histograms.
modules/livestore/instance.go Adds native histogram options to completion size + backpressure duration histograms.
modules/ingester/flush.go Adds native histogram options to ingester flush size histogram.
modules/frontend/v1/frontend.go Adds native histogram options to frontend v1 batch histograms.
modules/frontend/frontend.go Adds native histogram options to jobs-per-query histogram.
modules/distributor/distributor.go Adds native histogram options to Kafka request/latency histograms.
cmd/tempo/app/modules.go Enables native histograms for dskit server request duration metric via server config.

Comment thread pkg/drain/metrics.go
… config

Prometheus client only applies DefBuckets as a default when
NativeHistogramBucketFactor is not set. For histograms that had no
explicit Buckets defined, adding NativeHistogramBucketFactor alone would
suppress the classic _bucket series. Make the intent explicit by setting
Buckets: prometheus.DefBuckets on the four affected histograms.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@zalegrala zalegrala marked this pull request as ready for review April 9, 2026 15:36
@zalegrala zalegrala requested a review from mdisibio as a code owner April 9, 2026 15:36
Copilot AI review requested due to automatic review settings April 9, 2026 15:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comment thread cmd/tempo/app/modules.go
Comment on lines 93 to 97
func (t *App) initServer() (services.Service, error) {
t.cfg.Server.MetricsNamespace = metricsNamespace
t.cfg.Server.ExcludeRequestInLog = true
t.cfg.Server.MetricsNativeHistogramFactor = 1.1

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description/checklist says CHANGELOG.md was updated, but this PR’s diff doesn’t include a changelog entry. Since enabling native histograms for core metrics is a user-visible operational change (additional native histogram emission for OpenMetrics/protobuf scrapes), could you add a CHANGELOG.md entry in the project’s required format (with PR number + link)?

Copilot uses AI. Check for mistakes.
Comment thread modules/livestore/instance.go Outdated
Namespace: "tempo_live_store",
Name: "completion_size_bytes",
Help: "Size in bytes of blocks completed.",
Buckets: prometheus.ExponentialBuckets(1024*1024, 2, 10), // from 1MB up to 1GB
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true, ie from 1MB to 1GB? Or is this more like 500MB?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IE the first bucket is 1 MiB, so the last one would be 512MiB? Other than that looks good.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This is pre-existing, but I've pushed a commit to update the comment.

Copy link
Copy Markdown
Contributor

@mattdurham mattdurham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment change.

Comment thread modules/livestore/instance.go Outdated
Namespace: "tempo_live_store",
Name: "completion_size_bytes",
Help: "Size in bytes of blocks completed.",
Buckets: prometheus.ExponentialBuckets(1024*1024, 2, 10), // from 1MB up to 1GB
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IE the first bucket is 1 MiB, so the last one would be 512MiB? Other than that looks good.

zalegrala and others added 2 commits April 9, 2026 16:20
ExponentialBuckets(1MB, 2, 10) produces 10 buckets ending at 512MB.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 9, 2026 16:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Comment thread cmd/tempo/app/modules.go
func (t *App) initServer() (services.Service, error) {
t.cfg.Server.MetricsNamespace = metricsNamespace
t.cfg.Server.ExcludeRequestInLog = true
t.cfg.Server.MetricsNativeHistogramFactor = 1.1
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting t.cfg.Server.MetricsNativeHistogramFactor = 1.1 enables native histograms for server request/throughput metrics by default, which adds per-request CPU/memory overhead and (because the upstream field is yaml:"-") can’t be tuned/disabled via Tempo config.

Could we consider plumbing a Tempo config/flag for this (even if defaulting to 1.1), so operators can turn it off or adjust the factor if they hit performance or cardinality/memory issues?

Suggested change
t.cfg.Server.MetricsNativeHistogramFactor = 1.1
// Default to native histograms for server metrics, but do not overwrite
// an explicit value that may have been provided via flags or code.
if t.cfg.Server.MetricsNativeHistogramFactor <= 0 {
t.cfg.Server.MetricsNativeHistogramFactor = 1.1
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is necessary.

@zalegrala zalegrala merged commit 9a09fdf into grafana:main Apr 9, 2026
48 of 49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants