Bug description
There appears to be a regression in GPU Operator v26.3.1 around kataManager Helm defaults.
In v25.10.0, setting only:
kataManager:
enabled: true
was enough for the chart to render a usable ClusterPolicy.spec.kataManager.
In v26.3.1, enabling kataManager with the same style of values produces an incomplete ClusterPolicy.spec.kataManager block that lacks image metadata (repository, image, version), and the operator later panics during reconcile in the Kata runtime class path.
Environment
- GPU Operator: v26.3.1
- Helm chart: gpu-operator-v26.3.1
- Kubernetes: v1.35.3
- Sandbox workloads:
- sandboxWorkloads.enabled=true
- sandboxWorkloads.mode=kata
Reproduction
Use Helm values like:
cdi:
enabled: true
driver:
enabled: true
kernelModuleType: open
toolkit:
enabled: true
devicePlugin:
enabled: false
ccManager:
enabled: true
sandboxWorkloads:
enabled: true
defaultWorkload: "container"
mode: "kata"
kataSandboxDevicePlugin:
enabled: true
kataManager:
enabled: true
Install / upgrade the chart and inspect the rendered/applied ClusterPolicy.
Actual behavior
The resulting ClusterPolicy.spec.kataManager is incomplete, e.g.:
kataManager:
enabled: true
imagePullPolicy: IfNotPresent
with no repository, image, or version.
After that, the operator hits a reconcile panic in the Kata runtime class path, for example:
transformKataRuntimeClasses
RuntimeClasses
status:
conditions:
- lastTransitionTime: "2026-04-27T08:30:56Z"
message: ""
reason: Error
status: "False"
type: Ready
- lastTransitionTime: "2026-04-27T08:30:56Z"
message: 'Failed to reconcile state-kata-manager: empty image path provided through
both ClusterPolicy CR and ENV KATA_MANAGER_IMAGE'
reason: ReconcileFailed
status: "True"
type: Error
namespace: gpu-operator
state: notReady
with a nil pointer dereference.
Expected behavior
If kataManager.enabled=true, the chart should either:
- render a complete kataManager spec with valid defaults for:
or
- fail validation clearly during Helm rendering / reconcile, rather than producing an incomplete
ClusterPolicy and later panicking in the operator.
Why this looks like a regression
From comparing chart defaults:
v25.10.0 had kataManager image metadata defaults
v26.3.1 no longer appears to have default kataManager.repository/image/version
The ClusterPolicy template still conditionally renders these fields, so once the defaults disappeared, the rendered CR became incomplete.
Impact
This breaks the Kata sandbox workload flow when using the chart defaults / minimal values and can leave the operator in a notReady state due to controller panic.
Bug description
There appears to be a regression in GPU Operator
v26.3.1aroundkataManagerHelm defaults.In
v25.10.0, setting only:was enough for the chart to render a usable ClusterPolicy.spec.kataManager.
In
v26.3.1, enablingkataManagerwith the same style of values produces an incompleteClusterPolicy.spec.kataManagerblock that lacks image metadata (repository, image, version), and the operator later panics during reconcile in the Kata runtime class path.Environment
Reproduction
Use Helm values like:
Install / upgrade the chart and inspect the rendered/applied ClusterPolicy.
Actual behavior
The resulting ClusterPolicy.spec.kataManager is incomplete, e.g.:
with no repository, image, or version.
After that, the operator hits a reconcile panic in the Kata runtime class path, for example:
transformKataRuntimeClassesRuntimeClasseswith a nil pointer dereference.
Expected behavior
If kataManager.enabled=true, the chart should either:
or
ClusterPolicyand later panicking in the operator.Why this looks like a regression
From comparing chart defaults:
v25.10.0hadkataManagerimage metadata defaultsv26.3.1no longer appears to have defaultkataManager.repository/image/versionThe ClusterPolicy template still conditionally renders these fields, so once the defaults disappeared, the rendered CR became incomplete.
Impact
This breaks the Kata sandbox workload flow when using the chart defaults / minimal values and can leave the operator in a
notReady statedue to controller panic.