-
Notifications
You must be signed in to change notification settings - Fork 26
Description
I tried k3s + nvidia container + nix-snapshotter and found it didn't work work well.
I tested k3s + nvidia container and k3s + nix-snapshotter invidually and they worked well.
However, when I put them together, there were some problems.
Here is the nix script I tried.
I made this following k3s configuration guide in NixOS docs and nix-snapshotter docs.
I can provide the entire working nixos configuration If needed so please ask me.
The problem is that when I run a container runtime with this k8s configuration,
it failed to find nvidia runtime. It worked well before I added nix-snapshotter configuration.
Warning FailedCreatePodSandBox 2m27s (x1378 over 5h) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = unable to get OCI runtime for sandbox "34209d7f367586d856ce61dfc35010997619cfcae8280d4eb319b389f790a64f": no runtime for "nvidia" is configured
Here I share some of my speculations
When nix-snapshotter is enabled, then the extra flags provided for k3s is
--container-runtime-endpoint unix:///run/containerd/containerd.sock --image-service-endpoint unix:///run/nix-snapshotter/nix-snapshotter.sock --node-name=hserver6 --tls-san=k8s.internal --node-label=nvidia.com/gpu.present=true
this is when nix-snapshotter is NOT enabled
--node-name=hserver6 --tls-san=k8s.internal --node-label=nvidia.com/gpu.present=true
Does --container-runtime-endpoint unix:///run/containerd/containerd.sock cause this problem?
I removed that flag but it still failed.
I checked that /var/lib/rancher/k3s/agent/etc/containerd/config.toml is properlay configured to include the following part, but it still failed.
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "/run/current-system/sw/bin/nvidia-container-runtime.cdi"
Also nix-snapshotter introduces k3s.moreFlags as replacement of k3s.extraFlags. Is it relevant? I don't see the necessity of that option tbh. It doesn't help resolve conflicts of multiple flag declaration in any sense.
Does anyone have expreince this issue?