Skip to content

[Bug]: regression in GPU Operator v26.3.1 around kataManager Helm defaults. #2398

@Ayush-Rathor

Description

@Ayush-Rathor

Bug description

There appears to be a regression in GPU Operator v26.3.1 around kataManager Helm defaults.

In v25.10.0, setting only:

kataManager:
  enabled: true

was enough for the chart to render a usable ClusterPolicy.spec.kataManager.

In v26.3.1, enabling kataManager with the same style of values produces an incomplete ClusterPolicy.spec.kataManager block that lacks image metadata (repository, image, version), and the operator later panics during reconcile in the Kata runtime class path.

Environment

  • GPU Operator: v26.3.1
  • Helm chart: gpu-operator-v26.3.1
  • Kubernetes: v1.35.3
  • Sandbox workloads:
    • sandboxWorkloads.enabled=true
    • sandboxWorkloads.mode=kata

Reproduction

Use Helm values like:

cdi:
  enabled: true

driver:
  enabled: true
  kernelModuleType: open

toolkit:
  enabled: true

devicePlugin:
  enabled: false

ccManager:
  enabled: true

sandboxWorkloads:
  enabled: true
  defaultWorkload: "container"
  mode: "kata"

kataSandboxDevicePlugin:
  enabled: true

kataManager:
  enabled: true

Install / upgrade the chart and inspect the rendered/applied ClusterPolicy.

Actual behavior

The resulting ClusterPolicy.spec.kataManager is incomplete, e.g.:

kataManager:
  enabled: true
  imagePullPolicy: IfNotPresent

with no repository, image, or version.

After that, the operator hits a reconcile panic in the Kata runtime class path, for example:

  • transformKataRuntimeClasses
  • RuntimeClasses
status:
  conditions:
  - lastTransitionTime: "2026-04-27T08:30:56Z"
    message: ""
    reason: Error
    status: "False"
    type: Ready
  - lastTransitionTime: "2026-04-27T08:30:56Z"
    message: 'Failed to reconcile state-kata-manager: empty image path provided through
      both ClusterPolicy CR and ENV KATA_MANAGER_IMAGE'
    reason: ReconcileFailed
    status: "True"
    type: Error
  namespace: gpu-operator
  state: notReady

with a nil pointer dereference.

Expected behavior

If kataManager.enabled=true, the chart should either:

  1. render a complete kataManager spec with valid defaults for:
    • repository
    • image
    • version

or

  1. fail validation clearly during Helm rendering / reconcile, rather than producing an incomplete ClusterPolicy and later panicking in the operator.

Why this looks like a regression

From comparing chart defaults:

  • v25.10.0 had kataManager image metadata defaults
  • v26.3.1 no longer appears to have default kataManager.repository/image/version

The ClusterPolicy template still conditionally renders these fields, so once the defaults disappeared, the rendered CR became incomplete.

Impact

This breaks the Kata sandbox workload flow when using the chart defaults / minimal values and can leave the operator in a notReady state due to controller panic.

Metadata

Metadata

Assignees

Labels

bugIssue/PR to expose/discuss/fix a bugneeds-triageissue or PR has not been assigned a priority-px label

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions