Skip to content

Commit 3dff806

Browse files
authored
[docs] Add vGPU setup guide for GPU sharing between VMs (#467)
## What this PR does Adds a practical guide for running VMs with NVIDIA vGPU on Cozystack to the existing GPU passthrough page. The guide covers the **SR-IOV vGPU path** used by current data-centre GPUs (L4, L40, L40S, B100) on the vGPU 20.x driver branch. The mediated-devices path used by older GPUs (Pascal–Ampere) is explicitly out of scope and the reader is pointed at upstream NVIDIA docs. Steps documented: - Build the proprietary vGPU Manager container image from `github.com/NVIDIA/gpu-driver-container` (the older `gitlab.com/nvidia/container-images/driver` is archived). - Deploy GPU Operator with the `vgpu` variant via Package CR (depends on cozystack/cozystack#2323). - Assign vGPU profiles to SR-IOV VFs (`current_vgpu_type` sysfs), with a periodic profile-loader DaemonSet skeleton and an explicit experimental warning. - Configure DLS licensing via ClientConfigToken (the legacy NLS / `ServerAddress=` / `ServerPort=7070` flow no longer applies). - Patch the KubeVirt CR with `permittedHostDevices.pciHostDevices` (after kubevirt/kubevirt#16890; first stable KubeVirt release with the patch is targeted at v1.9.0). - Sample `VirtualMachine` (raw `kubevirt.io/v1`) using a CDI `DataVolume` so the rootfs has room for in-VM driver install, with a `cloudInitNoCloud` disk that drops the licensing token, `gridd.conf`, an SSH key, and the build dependencies. - vGPU profile reference table for L40S with the Q/A/B suffix taxonomy. - Warning about the 2.4 GiB containerDisk root overflow during in-VM driver install (we observed `SIGBUS` from a write into an mmap of a file the kernel could no longer extend; the new example sidesteps it via `DataVolume`). - Talos is explicitly noted as not recommended for vGPU; passthrough on Talos is unaffected. ### Release note ```release-note NONE ```
2 parents 3e58234 + 9a8a53b commit 3dff806

1 file changed

Lines changed: 453 additions & 23 deletions

File tree

  • content/en/docs/v1.2/virtualization

0 commit comments

Comments
 (0)