generator: back off when instance creation fails to avoid resource exhaustion by carles-grafana · Pull Request #6142 · grafana/tempo

carles-grafana · 2026-01-07T15:33:01Z

What this PR does:

When a processor validation fails and the instance creation fails, the generator will attempt again to create it indefinitely,
potentially causing OOM errors.

This change caches failed instances and backs off to prevent the issue.

Which issue(s) this PR fixes:
Fixes #

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

mdisibio · 2026-01-07T16:32:46Z

 	NoGenerateMetricsContextKey = "no-generate-metrics"
+
+	// failureBackoff is the duration to wait before retrying failed tenant instance creation.
+	failureBackoff = 10 * time.Minute


I think this means that after fixing the config issue, the operator must wait up to 10 minutes to know it's working. That seems like a long time, what do you think about something shorter like 1 or 5 minutes? When we found this, the failure rate was tens/hundreds of times per second (every message received from the queue), so I think even a 1 minute backoff is a huge improvement and guarantees stability.

Or is there a way to respond when the configuration is reloaded, and we could clear failedInstances to make the fix quicker?

when the processor is created successfully, the config is reloaded every 10 seconds: https://github.com/carles-grafana/tempo/blob/fix-generator-oom/modules/generator/instance.go#L144

so 1 minute for failed instances sounds good, changed

mdisibio · 2026-01-07T16:33:32Z

-	instances    map[string]*instance
+	instancesMtx  sync.RWMutex
+	instances     map[string]*instance
+	failedTenants map[string]time.Time // tenant -> when creation last failed


Although I like the word tenant the best, everything else in this the generator is called instance. Rename to failedInstances?

agree, changed

- When a processor validation fails and the instance creation fails, the generator will attempt again to create it indefinitely, potentially causing OOM errors. - This change caches failed instances and backs off to prevent the issue.

carles-grafana force-pushed the fix-generator-oom branch 2 times, most recently from 01f17aa to a4a5eef Compare January 7, 2026 15:39

carles-grafana changed the title ~~wip~~ generator: back off when instance fails to avoid resource exhaustion Jan 7, 2026

carles-grafana changed the title ~~generator: back off when instance fails to avoid resource exhaustion~~ generator: back off when instance creation fails to avoid resource exhaustion Jan 7, 2026

carles-grafana force-pushed the fix-generator-oom branch from 4fcf045 to d197127 Compare January 7, 2026 16:10

carles-grafana marked this pull request as ready for review January 7, 2026 16:13

carles-grafana requested review from electron0zero, ie-pham, javiermolinar, joe-elliott, mapno, mattdurham, mdisibio, oleg-kozlyuk-grafana, ruslan-mikhailov, stoewer, yvrhdn and zalegrala as code owners January 7, 2026 16:13

mdisibio reviewed Jan 7, 2026

View reviewed changes

carles-grafana force-pushed the fix-generator-oom branch from d197127 to c3199a2 Compare January 8, 2026 08:41

carles-grafana force-pushed the fix-generator-oom branch from c3199a2 to cad7f49 Compare January 8, 2026 08:42

mdisibio approved these changes Jan 8, 2026

View reviewed changes

carles-grafana merged commit b343b0f into grafana:main Jan 8, 2026
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generator: back off when instance creation fails to avoid resource exhaustion#6142

generator: back off when instance creation fails to avoid resource exhaustion#6142
carles-grafana merged 1 commit intografana:mainfrom
carles-grafana:fix-generator-oom

carles-grafana commented Jan 7, 2026 •

edited

Loading

Uh oh!

mdisibio Jan 7, 2026

Uh oh!

carles-grafana Jan 8, 2026

Uh oh!

mdisibio Jan 7, 2026

Uh oh!

carles-grafana Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

carles-grafana commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdisibio Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

carles-grafana Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

mdisibio Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

carles-grafana Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

carles-grafana commented Jan 7, 2026 •

edited

Loading