OpenTelemetry eBPF Instrumentation: Unbounded BPF internal metrics replay can exhaust CPU

Summary

OBI replays BPF probe hits into histogram observations by looping once per recorded run count. On busy systems, the run-count delta can become very large, causing the metrics exporter to spend excessive CPU time in a tight loop every collection interval.

Details

The vulnerable loop is in pkg/export/prom/prom_bpf.go. During each metrics tick, OBI iterates through probeMetrics and then executes for range metric.count, invoking BpfProbeLatency(...) for each individual recorded hit.

The count comes from calculateStats() in the same file, where deltaCount := bp.runCount - bp.prevRunCount is calculated and returned without any cap before the per-hit replay loop.

If probe activity spikes between scrape intervals, deltaCount can be very large. The exporter then spends CPU time proportional to the number of probe hits rather than the number of metric series.

PoC

Local testing with a small reproducer confirmed the replay-loop behavior and showed CPU scaling with the recorded hit count rather than the number of metric series.

Use a vulnerable build and enable internal metrics export:

git checkout v0.0.0-rc.1+build
make build
export OTEL_EBPF_INTERNAL_METRICS_PROMETHEUS_PORT=9090
sudo ./bin/obi

Create a high-rate workload that repeatedly exercises traced probes. For example, generate HTTP traffic against an instrumented service:

python3 -m http.server 18081

Then drive it:

seq 1 500000 | xargs -P 128 -I{} curl -s http://127.0.0.1:18081 >/dev/null

At the same time, scrape metrics repeatedly:

while true; do curl -s http://127.0.0.1:9090/metrics >/dev/null; done

On a vulnerable build, OBI CPU consumption rises sharply during the metrics loop because histogram updates are replayed once per counted probe execution. The effect is visible in top or pidstat and is most pronounced under sustained high request volume.

Impact

This is an availability issue in the internal metrics path. Any deployment that enables BPF internal metrics and traces busy workloads is affected. Attackers can indirectly consume CPU in the privileged agent by driving enough activity through instrumented services.

References

GHSA-89c6-vpcj-7vj4

MrAlias published to open-telemetry/opentelemetry-ebpf-instrumentation May 12, 2026

Published to the GitHub Advisory Database May 18, 2026

Reviewed May 18, 2026

Last updated May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Package

Affected versions

Patched versions

Description

Summary

Details

PoC

Impact

References

Severity

CVSS overall score

CVSS v3 base metrics

CVSS v3 base metrics

EPSS score

Weaknesses

Uncontrolled Resource Consumption

Excessive Iteration

CVE ID

GHSA ID

Source code

Credits

Uh oh!