Add KEDA autoscaling to microservices jsonnet#6970
Conversation
There was a problem hiding this comment.
Pull request overview
Adds KEDA-based horizontal pod autoscaling support to the Tempo OSS microservices Jsonnet library, integrating KEDA ScaledObjects and wiring component-specific scaling defaults and tests.
Changes:
- Introduces
autoscaling.libsonnetto generate KEDA ScaledObjects for distributor, metrics-generator, backend-worker, and block-builder. - Wires autoscaling into the microservices library (imports, configmap adjustments for block-builder partitions) and updates Jsonnet test fixtures.
- Vendors
keda-libsonnetand updates jb lockfiles plus changelog entry.
Reviewed changes
Copilot reviewed 12 out of 21 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| operations/jsonnet/microservices/test/test_autoscaling.jsonnet | Adds targeted Jsonnet assertions for autoscaling outputs. |
| operations/jsonnet/microservices/test/jsonnetfile.lock.json | Locks new Jsonnet test dependencies (KEDA + VPA libs). |
| operations/jsonnet/microservices/test/jsonnetfile.json | Adds KEDA + VPA Jsonnet dependencies for tests. |
| operations/jsonnet/microservices/test/environments/default/main.jsonnet | Enables distributor autoscaling in the default test environment. |
| operations/jsonnet/microservices/tempo.libsonnet | Includes the new autoscaling module in the composed library. |
| operations/jsonnet/microservices/jsonnetfile.lock.json | Locks KEDA dependency for the microservices Jsonnet library. |
| operations/jsonnet/microservices/jsonnetfile.json | Adds keda-libsonnet as a dependency for microservices Jsonnet. |
| operations/jsonnet/microservices/configmap.libsonnet | Adjusts block-builder Tempo config to set partitions_per_instance when autoscaling is enabled. |
| operations/jsonnet/microservices/autoscaling.libsonnet | Implements configurable KEDA ScaledObjects and related defaults/overrides. |
| operations/jsonnet/microservices/Makefile | Runs the new autoscaling Jsonnet test. |
| operations/jsonnet-compiled/vendor/github.com/jsonnet-libs/keda-libsonnet/2.15/main.libsonnet | Vendored KEDA lib entrypoint for compiled Jsonnet distribution. |
| operations/jsonnet-compiled/vendor/github.com/jsonnet-libs/keda-libsonnet/2.15/gen.libsonnet | Vendored KEDA generated package wrapper. |
| operations/jsonnet-compiled/vendor/github.com/jsonnet-libs/keda-libsonnet/2.15/_gen/keda/v1alpha1/triggerAuthentication.libsonnet | Vendored generated KEDA TriggerAuthentication bindings. |
| operations/jsonnet-compiled/vendor/github.com/jsonnet-libs/keda-libsonnet/2.15/_gen/keda/v1alpha1/scaledObject.libsonnet | Vendored generated KEDA ScaledObject bindings. |
| operations/jsonnet-compiled/vendor/github.com/jsonnet-libs/keda-libsonnet/2.15/_gen/keda/v1alpha1/main.libsonnet | Vendored generated KEDA v1alpha1 module. |
| operations/jsonnet-compiled/vendor/github.com/jsonnet-libs/keda-libsonnet/2.15/_gen/keda/v1alpha1/clusterTriggerAuthentication.libsonnet | Vendored generated KEDA ClusterTriggerAuthentication bindings. |
| operations/jsonnet-compiled/vendor/github.com/jsonnet-libs/keda-libsonnet/2.15/_gen/keda/main.libsonnet | Vendored generated KEDA root module. |
| operations/jsonnet-compiled/jsonnetfile.lock.json | Updates compiled Jsonnet lockfile to include KEDA dependency. |
| CHANGELOG.md | Adds user-facing changelog entry for KEDA autoscaling support. |
| tempo_distributor_deployment+: | ||
| if $._config.autoscaling.distributor.enabled then $.removeReplicasFromSpec else {}, | ||
|
|
There was a problem hiding this comment.
When autoscaling is enabled you remove spec.replicas from the Deployment/StatefulSet. Kubernetes defaults replicas to 1 when the field is omitted, which means components with min_replicas > 1 (e.g. distributor=2, backend_worker=3) can come up initially under-provisioned until KEDA reconciles. Is that transient state acceptable for these targets, or should the template set an initial replicas value (or offer a config toggle) to avoid starting below min_replicas?
There was a problem hiding this comment.
This is expected and matches standard KEDA usage — the ScaledObject owns the replica count and reconciles quickly. The transient state is short-lived and acceptable for these components.
| + scaledObject.spec.withMinReplicaCount(config.min_replicas) | ||
| + scaledObject.spec.withMaxReplicaCount(config.max_replicas) | ||
| + scaledObject.spec.withPollingInterval(60) | ||
| + ( |
There was a problem hiding this comment.
scaledObjectForController hard-codes pollingInterval to 60s, but several configs (e.g. distributor) describe much faster scaling periods (15s). With a 60s polling interval KEDA will only refresh trigger values once per minute, which can significantly delay scale-up/scale-down and makes the “every 15s” behavior unrealistic. Could we either (a) make polling interval configurable (per scaler or globally), or (b) align the comments/defaults so the intended responsiveness matches what the rendered ScaledObject will actually do?
There was a problem hiding this comment.
The pollingInterval controls how often KEDA queries the trigger source, while periodSeconds in the HPA scaling policies controls how the HPA acts on the metrics it already has. They're independent — 60s polling with 15s HPA periods is intentional and working well for us.
| local scaleDownBehavior = scaledObject.spec.advanced.horizontalPodAutoscalerConfig.behavior.scaleDown, | ||
|
|
||
| _config+:: { | ||
| autoscaling: { |
There was a problem hiding this comment.
Do we want to nest under autoscaling? It might be preferable for a flat structure which is something akin to _config.distributor.keda.enabled instead of nesting under autoscaling.
There was a problem hiding this comment.
There was a problem hiding this comment.
Good call. Refactored to _config.<component>.keda to match the VPA/PDB pattern. Also moved prometheus_address into backend_worker.keda since it's the only consumer.
Add KEDA ScaledObject definitions for distributor (CPU), metrics-generator (CPU), backend-worker (Prometheus outstanding blocks), and block-builder (kubernetes-workload scaler). These configurations have been running successfully in production. All disabled by default and fully configurable.
Matches VPA/PDB convention where each component owns its config subtree. Also moves prometheus_address into backend_worker.keda where it's used.
What this PR does:
Adds KEDA-based horizontal pod autoscaling support to the OSS microservices jsonnet library. These configurations have been running successfully in production.
All scalers are disabled by default. Configured per component via
_config.<component>.keda(matching the existing VPA and PDB patterns).Which issue(s) this PR fixes:
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]