grafana · knylander-grafana · Oct 3, 2025 · Aug 30, 2025 · Sep 4, 2025 · Sep 4, 2025
@@ -40,7 +40,7 @@ To activate the `local-blocks` processor for all users, add it to the list of pr
 ```yaml
 # Global overrides configuration.
 overrides:
-  metrics_generator_processors: ['local-blocks']
+  metrics_generator_processors: ["local-blocks"]
 ```
 
 To configure the processor per tenant, use the `metrics_generator_processor` override.
@@ -49,7 +49,7 @@ Example for per-tenant in the per-tenant overrides:
 
 ```yaml
 overrides:
-  'tenantID':
+  "tenantID":
     metrics_generator_processors:
       - local-blocks
 ```
@@ -135,3 +135,31 @@ query_frontend:
     concurrent_jobs: 8
     target_bytes_per_job: 1.25e+09 # ~1.25GB
 ```
+
+## Sampling and performance optimization
+
+TraceQL metrics queries support sampling hints to improve performance on large datasets.
+
+### Sampling configuration considerations
+
+When using sampling in your TraceQL metrics queries, consider:
+
+- **Timeout settings:** Sampled queries run faster but may still benefit from adequate timeouts
+- **Concurrent jobs:** Sampling reduces per-job processing time, allowing higher concurrency
+- **Job sizing:** With sampling, smaller job sizes may be more efficient
+
+Example configuration optimized for sampling:
+
+```yaml
+query_frontend:
+  metrics:
+    concurrent_jobs: 1500 # Higher concurrency with sampling
+    target_bytes_per_job: 1.5e+08 # Smaller jobs with sampling
+```
+
+### Sampling best practices
+
+- Use `sample=true` for dashboard queries requiring fast refresh
+- Apply fixed sampling rates for consistent approximation levels
+- Avoid sampling for alerts or precise measurements
+- Test sampling accuracy against your specific data patterns
@@ -335,3 +335,54 @@ This example means the attribute `resource.cluster` had too many values.
 ```
 { __meta_error="__too_many_values__", resource.cluster=<nil> }
 ```
+
+## Query hints and sampling
+
+TraceQL metrics queries support query hints using the `with()` syntax to optimize performance and control sampling behavior.
+
+TraceQL metrics queries support sampling hints to improve performance by processing a subset of data.
+
+Sampling is particularly effective for:
+
+- Aggregation queries over large datasets
+- Dashboard queries requiring fast refresh
+- Exploratory data analysis
+
+{{< admonition  type="note" >}}
+Sampling hints only work with TraceQL metrics queries (those using functions like `rate()`, `count_over_time()`, etc.).
+{{< /admonition >}}
+
+### Adaptive sampling: `with(sample=true)`
+
+Automatically determines optimal sampling strategy based on query selectivity and data volume.
+
+```
+{ resource.service.name="frontend" } | rate() with(sample=true)
+```
+
+- **Use case:** Heavy queries with large result sets
+- **Performance:** 2-4x improvement on queries like `{ } | rate()`
+- **Accuracy:** Maintains high accuracy by adapting sampling rate
+
+#### Fixed span sampling: `with(span_sample=0.xx)`
+
+Samples a fixed percentage of spans for span-level aggregations.
+
+```
+{ status=error } | count_over_time() with(span_sample=0.1)
+```
+
+#### Fixed trace sampling: `with(trace_sample=0.xx)`
+
+Samples a fixed percentage of traces for trace-level aggregations.
+
+```
+{ } | count() by (resource.service.name) with(trace_sample=0.05)
+```
+
+### When to use sampling
+
+- **Heavy aggregation queries** with large datasets
+- **Exploratory analysis** where approximate results are acceptable
+- **Dashboard queries** that need faster refresh times
+- **Avoid sampling** for precise metrics or rare event detection
@@ -0,0 +1,137 @@
+---
+title: TraceQL metrics sampling guide
+menuTitle: Sampling guide
+description: Optimize TraceQL metrics query performance using sampling hints
+weight: 500
+keywords:
+  - TraceQL metrics
+  - sampling
+  - performance optimization
+  - query optimization
+---
+
+# TraceQL metrics sampling guide
+
+{{< docs/shared source="tempo" lookup="traceql-metrics-admonition.md" version="<TEMPO_VERSION>" >}}
+
+TraceQL metrics sampling is a performance optimization feature that enables faster query execution by processing a subset of trace data while maintaining acceptable accuracy. Sampling delivers 2-4x performance improvements for heavy aggregation queries.
+
+## Overview
+
+TraceQL metrics sampling addresses the challenge of balancing query performance with data accuracy when working with large-scale trace datasets. Sampling intelligently selects a representative subset of data for processing, making it particularly valuable for:
+
+- Real-time dashboards requiring fast refresh rates
+- Exploratory data analysis where approximate results accelerate insights
+- Resource-constrained environments with limited compute capacity
+- Large-scale deployments processing terabytes of trace data daily
+
+## Prerequisites
+
+TraceQL metrics sampling requires:
+
+- Tempo 2.8+ with TraceQL metrics enabled
+- `local-blocks` processor configured in metrics-generator
+- Grafana 10.4+ or Grafana Cloud for UI integration
+
+## Choose a sampling method
+
+### Adaptive sampling: `with(sample=true)`
+
+Adaptive sampling automatically determines the optimal sampling strategy based on query characteristics. It switches between span-level and trace-level sampling as needed and adjusts sampling rates dynamically.
+
+```traceql
+{ resource.service.name="checkout-service" } | rate() with(sample=true)
+{ status=error } | count_over_time() by (resource.service.name) with(sample=true)
+```
+
+**Best for:** Heavy aggregation queries, dashboard queries, and multi-service analysis with unpredictable data volumes.
+
+**Limitations:** May over-sample rare events and results vary across blocks as new data arrives.
+
+### Fixed span sampling: `with(span_sample=0.xx)`
+
+Fixed span sampling selects a specified percentage of spans using consistent hashing of span IDs. Provides predictable performance improvements and deterministic results.
+
+```traceql
+{ status=error } | rate() by (resource.service.name) with(span_sample=0.1)
+```
+
+**Best for:** Consistent approximation, large-scale monitoring, and cost optimization scenarios.
+
+**Limitations:** May miss important events during low-volume periods and not optimal for naturally selective queries.
+
+### Fixed trace sampling: `with(trace_sample=0.xx)`
+
+Fixed trace sampling selects complete traces for analysis, preserving trace context and relationships between spans within the same request flow.
+
+```traceql
+{ } | count() by (resource.service.name) with(trace_sample=0.1)
+```
+
+**Best for:** Trace-level aggregations, service dependency mapping, and error correlation analysis.
+
+**Limitations:** May provide poor accuracy for span-level metrics and can introduce bias if trace volumes vary significantly across services.
+
+## Implement sampling
+
+### Get started
+
+1. **Verify prerequisites:** Check Tempo version and ensure local-blocks processor is enabled
+2. **Start with adaptive sampling:** Apply `with(sample=true)` to non-critical queries first
+3. **Measure performance:** Compare execution times before and after sampling
+4. **Validate accuracy:** Test sampled results against exact results for critical queries
+
+### Grafana integration
+
+Use sampling in dashboard panels:
+
+```json
+{
+  "expr": "{ resource.service.name=\"frontend\" } | rate() with(sample=true)"
+}
+```
+
+For alerts, avoid sampling for critical alerts that trigger operational responses. Adaptive sampling is acceptable for warning alerts and trend monitoring.
+
+### Configuration optimization
+
+Increase query concurrency since sampling reduces per-job processing:
+
+```yaml
+query_frontend:
+  metrics:
+    concurrent_jobs: 1500
+    target_bytes_per_job: 1.5e+08
+```
+
+## Best practices
+
+### Query design
+
+- **Use broad queries:** Sampling works best with queries that match many spans
+- **Align sampling with aggregation scope:** Use span sampling for span-level aggregations, trace sampling for trace-level aggregations
+- **Consider temporal patterns:** Adjust sampling rates based on data age and query frequency
+
+### Select sampling rates by use case
+
+- **Real-time monitoring (0-1h):** Adaptive sampling or 10%+ fixed rates
+- **Recent analysis (1h-1d):** 5-10% sampling
+- **Historical trends (1d+):** 1-5% sampling
+- **Long-term analysis (30d+):** 0.1-1% sampling
+
+### Decision framework
+
+1. **Critical measurement needed?** → No sampling
+2. **Dashboard or trend analysis?** → Adaptive sampling
+3. **Historical analysis or capacity planning?** → Fixed sampling (1-5%)
+4. **Cost optimization or exploration?** → Low fixed sampling (0.1-1%)
+
+### Migration approach
+
+1. Test all sampling configurations in development first
+2. Migrate dashboard queries before alerting queries
+3. Document sampling rationale and accuracy requirements
+4. Configure monitoring for sampling effectiveness
+5. Plan rollback procedures for accuracy issues
+
+By following these practices, you can successfully integrate TraceQL metrics sampling into your observability workflows, achieving significant performance improvements while maintaining data quality for effective monitoring and analysis.