Skip to content

Add KEDA autoscaling to microservices jsonnet#6970

Merged
mapno merged 4 commits intografana:mainfrom
mapno:keda-autoscaling
Apr 22, 2026
Merged

Add KEDA autoscaling to microservices jsonnet#6970
mapno merged 4 commits intografana:mainfrom
mapno:keda-autoscaling

Conversation

@mapno
Copy link
Copy Markdown
Contributor

@mapno mapno commented Apr 15, 2026

What this PR does:

Adds KEDA-based horizontal pod autoscaling support to the OSS microservices jsonnet library. These configurations have been running successfully in production.

  • Distributor: CPU-based autoscaling
  • Metrics-generator: CPU-based autoscaling
  • Backend-worker: Prometheus-based autoscaling on outstanding blocks
  • Block-builder: KEDA kubernetes-workload scaler matching live-store zone-a pod count

All scalers are disabled by default. Configured per component via _config.<component>.keda (matching the existing VPA and PDB patterns).

Which issue(s) this PR fixes:

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copilot AI review requested due to automatic review settings April 15, 2026 13:18
@mapno mapno force-pushed the keda-autoscaling branch from 8184838 to 6af06d2 Compare April 15, 2026 13:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds KEDA-based horizontal pod autoscaling support to the Tempo OSS microservices Jsonnet library, integrating KEDA ScaledObjects and wiring component-specific scaling defaults and tests.

Changes:

  • Introduces autoscaling.libsonnet to generate KEDA ScaledObjects for distributor, metrics-generator, backend-worker, and block-builder.
  • Wires autoscaling into the microservices library (imports, configmap adjustments for block-builder partitions) and updates Jsonnet test fixtures.
  • Vendors keda-libsonnet and updates jb lockfiles plus changelog entry.

Reviewed changes

Copilot reviewed 12 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
operations/jsonnet/microservices/test/test_autoscaling.jsonnet Adds targeted Jsonnet assertions for autoscaling outputs.
operations/jsonnet/microservices/test/jsonnetfile.lock.json Locks new Jsonnet test dependencies (KEDA + VPA libs).
operations/jsonnet/microservices/test/jsonnetfile.json Adds KEDA + VPA Jsonnet dependencies for tests.
operations/jsonnet/microservices/test/environments/default/main.jsonnet Enables distributor autoscaling in the default test environment.
operations/jsonnet/microservices/tempo.libsonnet Includes the new autoscaling module in the composed library.
operations/jsonnet/microservices/jsonnetfile.lock.json Locks KEDA dependency for the microservices Jsonnet library.
operations/jsonnet/microservices/jsonnetfile.json Adds keda-libsonnet as a dependency for microservices Jsonnet.
operations/jsonnet/microservices/configmap.libsonnet Adjusts block-builder Tempo config to set partitions_per_instance when autoscaling is enabled.
operations/jsonnet/microservices/autoscaling.libsonnet Implements configurable KEDA ScaledObjects and related defaults/overrides.
operations/jsonnet/microservices/Makefile Runs the new autoscaling Jsonnet test.
operations/jsonnet-compiled/vendor/github.com/jsonnet-libs/keda-libsonnet/2.15/main.libsonnet Vendored KEDA lib entrypoint for compiled Jsonnet distribution.
operations/jsonnet-compiled/vendor/github.com/jsonnet-libs/keda-libsonnet/2.15/gen.libsonnet Vendored KEDA generated package wrapper.
operations/jsonnet-compiled/vendor/github.com/jsonnet-libs/keda-libsonnet/2.15/_gen/keda/v1alpha1/triggerAuthentication.libsonnet Vendored generated KEDA TriggerAuthentication bindings.
operations/jsonnet-compiled/vendor/github.com/jsonnet-libs/keda-libsonnet/2.15/_gen/keda/v1alpha1/scaledObject.libsonnet Vendored generated KEDA ScaledObject bindings.
operations/jsonnet-compiled/vendor/github.com/jsonnet-libs/keda-libsonnet/2.15/_gen/keda/v1alpha1/main.libsonnet Vendored generated KEDA v1alpha1 module.
operations/jsonnet-compiled/vendor/github.com/jsonnet-libs/keda-libsonnet/2.15/_gen/keda/v1alpha1/clusterTriggerAuthentication.libsonnet Vendored generated KEDA ClusterTriggerAuthentication bindings.
operations/jsonnet-compiled/vendor/github.com/jsonnet-libs/keda-libsonnet/2.15/_gen/keda/main.libsonnet Vendored generated KEDA root module.
operations/jsonnet-compiled/jsonnetfile.lock.json Updates compiled Jsonnet lockfile to include KEDA dependency.
CHANGELOG.md Adds user-facing changelog entry for KEDA autoscaling support.

Comment thread operations/jsonnet/microservices/autoscaling.libsonnet
Comment on lines +155 to +157
tempo_distributor_deployment+:
if $._config.autoscaling.distributor.enabled then $.removeReplicasFromSpec else {},

Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When autoscaling is enabled you remove spec.replicas from the Deployment/StatefulSet. Kubernetes defaults replicas to 1 when the field is omitted, which means components with min_replicas > 1 (e.g. distributor=2, backend_worker=3) can come up initially under-provisioned until KEDA reconciles. Is that transient state acceptable for these targets, or should the template set an initial replicas value (or offer a config toggle) to avoid starting below min_replicas?

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is expected and matches standard KEDA usage — the ScaledObject owns the replica count and reconciles quickly. The transient state is short-lived and acceptable for these components.

Comment thread operations/jsonnet/microservices/test/test_autoscaling.jsonnet Outdated
@mapno mapno force-pushed the keda-autoscaling branch from 6af06d2 to 2642532 Compare April 15, 2026 13:26
Copilot AI review requested due to automatic review settings April 15, 2026 13:35
@mapno mapno force-pushed the keda-autoscaling branch from 2642532 to dfc1c85 Compare April 15, 2026 13:35
@mapno mapno force-pushed the keda-autoscaling branch from dfc1c85 to 067e7e5 Compare April 15, 2026 13:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 19 changed files in this pull request and generated 3 comments.

Comment on lines +104 to +107
+ scaledObject.spec.withMinReplicaCount(config.min_replicas)
+ scaledObject.spec.withMaxReplicaCount(config.max_replicas)
+ scaledObject.spec.withPollingInterval(60)
+ (
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scaledObjectForController hard-codes pollingInterval to 60s, but several configs (e.g. distributor) describe much faster scaling periods (15s). With a 60s polling interval KEDA will only refresh trigger values once per minute, which can significantly delay scale-up/scale-down and makes the “every 15s” behavior unrealistic. Could we either (a) make polling interval configurable (per scaler or globally), or (b) align the comments/defaults so the intended responsiveness matches what the rendered ScaledObject will actually do?

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pollingInterval controls how often KEDA queries the trigger source, while periodSeconds in the HPA scaling policies controls how the HPA acts on the metrics it already has. They're independent — 60s polling with 15s HPA periods is intentional and working well for us.

Comment thread operations/jsonnet/microservices/autoscaling.libsonnet
Comment thread operations/jsonnet/microservices/autoscaling.libsonnet Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 19 changed files in this pull request and generated 1 comment.

Comment thread operations/jsonnet/microservices/autoscaling.libsonnet
local scaleDownBehavior = scaledObject.spec.advanced.horizontalPodAutoscalerConfig.behavior.scaleDown,

_config+:: {
autoscaling: {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to nest under autoscaling? It might be preferable for a flat structure which is something akin to _config.distributor.keda.enabled instead of nesting under autoscaling.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Refactored to _config.<component>.keda to match the VPA/PDB pattern. Also moved prometheus_address into backend_worker.keda since it's the only consumer.

mapno added 3 commits April 20, 2026 16:50
Add KEDA ScaledObject definitions for distributor (CPU), metrics-generator
(CPU), backend-worker (Prometheus outstanding blocks), and block-builder
(kubernetes-workload scaler). These configurations have been running
successfully in production. All disabled by default and fully configurable.
Matches VPA/PDB convention where each component owns its config subtree.
Also moves prometheus_address into backend_worker.keda where it's used.
Copilot AI review requested due to automatic review settings April 20, 2026 14:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 19 changed files in this pull request and generated 3 comments.

Comment thread operations/jsonnet/microservices/autoscaling.libsonnet
Comment thread operations/jsonnet/microservices/configmap.libsonnet
Comment thread operations/jsonnet/microservices/autoscaling.libsonnet
Copy link
Copy Markdown
Contributor

@javiermolinar javiermolinar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mapno mapno merged commit 2b291ad into grafana:main Apr 22, 2026
44 of 45 checks passed
@mapno mapno deleted the keda-autoscaling branch April 22, 2026 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants