Skip to content

Add krunkit driver supporting GPU acceleration on macOS #20826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

nirs
Copy link
Contributor

@nirs nirs commented May 23, 2025

krunkit is a tool to launch configurable virtual machines using the libkrun platform, optimized for GPU accelerated virtual machines and AI workloads on Apple silicon.

It is mostly compatible with vfkit; the driver is a simplified copy of the vfkit driver.

To experiment with the krunkit driver see https://github.com/medyagh/ai-playground-minikube/tree/main/macos

Benchmarking GPU compute workloads shows 2 orders of magnitude improvement with Virtio-GPU, and fully utilized GPU:

Screenshot 2025-06-30 at 0 02 19

Status

  • Tested using locally built iso
  • Start/delete clusters
  • GPU available in the guest
  • Run GPU benchmarks

Based on #20995 for testing.

Fixes #20803

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 23, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nirs
Once this PR has been reviewed and has the lgtm label, please assign spowelljr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 23, 2025
@k8s-ci-robot k8s-ci-robot requested a review from spowelljr May 23, 2025 19:11
@k8s-ci-robot
Copy link
Contributor

Hi @nirs. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 23, 2025
@minikube-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 24, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 24, 2025
@nirs nirs force-pushed the krunkit branch 3 times, most recently from 7cda3a5 to cbe0012 Compare May 25, 2025 01:25
@nirs
Copy link
Contributor Author

nirs commented May 25, 2025

@afbjorklund can you review this?

I think the issue of not having /dev/dri is using too old kernel. We are using 5.10.207 while libkrun seems to require 5.16 or later.

libkrun for macOS is using venus, which requires

The Venus driver requires supports for

VIRTGPU_PARAM_3D_FEATURES
VIRTGPU_PARAM_CAPSET_QUERY_FIX
VIRTGPU_PARAM_RESOURCE_BLOB
VIRTGPU_PARAM_HOST_VISIBLE
VIRTGPU_PARAM_CROSS_DEVICE
VIRTGPU_PARAM_CONTEXT_INIT

from the virtio-gpu kernel driver, unless vtest is used. That usually means the guest kernel should be at least 5.16

So it seems that we need to move to newer kernel. Do you know why are stuck with 5.10?

BR2_LINUX_KERNEL_CUSTOM_VERSION_VALUE="5.10.207"

@afbjorklund
Copy link
Collaborator

afbjorklund commented May 25, 2025

So it seems that we need to move to newer kernel. Do you know why are stuck with 5.10?

I don't think it is stuck with anything, just that it was using the LTS versions...

It seems that a new kernel version was not included, in the minikube OS upgrade.

Buildroot supports many: https://github.com/buildroot/buildroot/blob/2025.02.x/linux/linux.hash

  • 5.4
  • 5.10
  • 5.15
  • 6.1
  • 6.6
  • 6.12

See also https://www.kernel.org/ ("longterm")

@nirs
Copy link
Contributor Author

nirs commented May 25, 2025

So it seems that we need to move to newer kernel. Do you know why are stuck with 5.10?

I don't think it is stuck with anything, just that it was using the LTS versions...

It seems that a new kernel version was not included, in the minikube OS upgrade.

Buildroot supports many: https://github.com/buildroot/buildroot/blob/2025.02.x/linux/linux.hash

  • 5.4
  • 5.10
  • 5.15
  • 6.1
  • 6.6
  • 6.12

See also https://www.kernel.org/ ("longterm")

So should we try to update the kernel to latest longterm version (6.12.30)?

@afbjorklund
Copy link
Collaborator

afbjorklund commented May 25, 2025

There was some talk about bumping to a newer kernel: (6.6?)

But that was last year, and for the 2024.02.x OS. So maybe.

It seems that it is only needed for this AI feature, though?

@nirs
Copy link
Contributor Author

nirs commented May 25, 2025

It seems that it is only needed for this AI feature, though?

Yes, this is for AI on Apple silicon. I can try to get build an iso to play with it, and when we a working iso we can discuss how to proceed with the upgrade.

@afbjorklund
Copy link
Collaborator

afbjorklund commented May 25, 2025

So you could still use vfkit for all other (non-AI) usage of Kubernetes.

Or maybe even make it a single driver, if the syntax is close enough...

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 25, 2025
}

// Make a boot2docker VM disk image.
func (d *Driver) generateDiskImage(size int) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate of vfkit version, which may duplicate of qemu?

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 7, 2025
@nirs
Copy link
Contributor Author

nirs commented Jun 7, 2025

ok-to-build-iso

@minikube-pr-bot

This comment has been minimized.

@minikube-bot
Copy link
Collaborator

Hi @nirs, we have updated your PR with the reference to newly built ISO. Pull the changes locally if you want to test with them or update your PR further.

@minikube-pr-bot

This comment has been minimized.

@spowelljr spowelljr removed their request for review June 26, 2025 23:51
@nirs nirs force-pushed the krunkit branch 2 times, most recently from 8e1f14e to 48d867c Compare June 29, 2025 18:42
@minikube-pr-bot

This comment has been minimized.

@nirs nirs changed the title WIP: krunkit driver Add krunkit driver supporting GPU acceleration on macOS Jun 29, 2025
@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@nirs nirs force-pushed the krunkit branch 2 times, most recently from 304e9ac to faddf3a Compare June 30, 2025 19:19
@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 20826) |
+----------------+----------+---------------------+
| minikube start | 51.5s    | 49.8s               |
| enable ingress | 15.0s    | 14.8s               |
+----------------+----------+---------------------+

Times for minikube (PR 20826) start: 48.6s 49.5s 50.6s 48.4s 51.8s
Times for minikube start: 50.8s 52.8s 50.6s 52.1s 51.4s

Times for minikube ingress: 14.5s 15.5s 15.0s 15.5s 14.5s
Times for minikube (PR 20826) ingress: 14.9s 14.5s 15.0s 14.4s 14.9s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 20826) |
+----------------+----------+---------------------+
| minikube start | 23.2s    | 23.0s               |
| enable ingress | 12.1s    | 12.7s               |
+----------------+----------+---------------------+

Times for minikube start: 22.2s 25.5s 25.2s 22.2s 21.0s
Times for minikube (PR 20826) start: 23.4s 22.4s 22.9s 23.2s 23.1s

Times for minikube ingress: 12.3s 12.2s 12.7s 11.2s 12.2s
Times for minikube (PR 20826) ingress: 13.2s 12.2s 10.7s 13.2s 13.8s

docker driver with containerd runtime

+-------------------+----------+---------------------+
|      COMMAND      | MINIKUBE | MINIKUBE (PR 20826) |
+-------------------+----------+---------------------+
| minikube start    | 22.4s    | 23.1s               |
| ⚠️  enable ingress | 23.2s    | 29.3s ⚠️             |
+-------------------+----------+---------------------+

Times for minikube start: 25.0s 20.9s 21.9s 22.6s 21.8s
Times for minikube (PR 20826) start: 24.5s 21.6s 23.3s 24.4s 21.8s

Times for minikube ingress: 22.7s 24.2s 22.7s 23.3s 23.2s
Times for minikube (PR 20826) ingress: 23.3s 22.7s 38.7s 39.2s 22.7s

@nirs
Copy link
Contributor Author

nirs commented Jun 30, 2025

Comparing start time: krunkit, vfkit, qemu

krunkit start a cluster with --no-kubernetes 1.24 times faster than vfkit and 1.91 faster than qemu. vfkit is 1.04 times faster when starting a full cluster.

vm only

% hyperfine -r 10 -C "krunkit/out/minikube delete" \
     "krunkit/out/minikube start --driver krunkit --no-kubernetes" \
     "krunkit/out/minikube start --driver vfkit --network vmnet-shared --no-kubernetes" \
     "krunkit/out/minikube start --driver qemu --no-kubernetes"
Benchmark 1: krunkit/out/minikube start --driver krunkit --no-kubernetes
  Time (mean ± σ):      8.355 s ±  1.112 s    [User: 0.283 s, System: 0.236 s]
  Range (min … max):    7.067 s … 10.269 s    10 runs
 
Benchmark 2: krunkit/out/minikube start --driver vfkit --network vmnet-shared --no-kubernetes
  Time (mean ± σ):     10.349 s ±  0.536 s    [User: 0.373 s, System: 0.243 s]
  Range (min … max):    9.511 s … 11.129 s    10 runs
 
Benchmark 3: krunkit/out/minikube start --driver qemu --no-kubernetes
  Time (mean ± σ):     15.977 s ±  0.433 s    [User: 0.507 s, System: 0.250 s]
  Range (min … max):   15.570 s … 16.787 s    10 runs
 
Summary
  krunkit/out/minikube start --driver krunkit --no-kubernetes ran
    1.24 ± 0.18 times faster than krunkit/out/minikube start --driver vfkit --network vmnet-shared --no-kubernetes
    1.91 ± 0.26 times faster than krunkit/out/minikube start --driver qemu --no-kubernetes

kubernetes cluster

% hyperfine -r 5 -C "krunkit/out/minikube delete" \
    "krunkit/out/minikube start --driver krunkit --container-runtime containerd" \
    "krunkit/out/minikube start --driver vfkit --container-runtime containerd --network vmnet-shared" \
    "krunkit/out/minikube start --driver qemu --container-runtime containerd"
Benchmark 1: krunkit/out/minikube start --driver krunkit --container-runtime containerd
  Time (mean ± σ):     20.958 s ±  1.098 s    [User: 0.978 s, System: 0.909 s]
  Range (min … max):   19.347 s … 21.887 s    5 runs
 
Benchmark 2: krunkit/out/minikube start --driver vfkit --container-runtime containerd --network vmnet-shared
  Time (mean ± σ):     20.108 s ±  0.774 s    [User: 1.057 s, System: 1.298 s]
  Range (min … max):   19.036 s … 21.078 s    5 runs
 
Benchmark 3: krunkit/out/minikube start --driver qemu --container-runtime containerd
  Time (mean ± σ):     27.047 s ±  1.633 s    [User: 0.932 s, System: 0.901 s]
  Range (min … max):   25.457 s … 28.802 s    5 runs
 
Summary
  krunkit/out/minikube start --driver vfkit --container-runtime containerd --network vmnet-shared ran
    1.04 ± 0.07 times faster than krunkit/out/minikube start --driver krunkit --container-runtime containerd
    1.35 ± 0.10 times faster than krunkit/out/minikube start --driver qemu --container-runtime containerd

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 2, 2025
nirs and others added 8 commits July 2, 2025 11:01
Generated by running `make iso-menuconfig-x86_64` and updating kernel
version to longterm kernel 6.6.95 and kernel headers to 6.6.x, and then
running `make linux-menuconfig-x86_64` to update the linux config.

Additinally update hyperv-daemons package to use kernel 6.x.
Generated by running `make iso-menuconfig-aarch64` and updating kernel
version to longterm kernel 6.6.95 and kernel headers to 6.6.x, and then
running `make linux-menuconfig-aarch64` to update the linux config.
The krunkit driver exposes the host GPU via VirtIO GPU, enabling AI
workloads in the guest.
krunkit is a tool to launch configurable virtual machines using the
libkrun platform, optimized for GPU accelerated virtual machines and AI
workloads on Apple silicon.

It is mostly compatible with vfkit; the driver is a simplified copy of
the vfkit driver. Unlike vfkit, krunkit is available only on Apple
silicon.

Changes compared to vfkit driver:
- krunkit requires unix socket for netwroking, so we must use
  vment-helper.
- krunkit does not support HardStop, so we kill it using SIGKILL.
- We must enable vmnet offloading, required for krunkit.
- The code was simplified since vmnet-helper is always used
- Code was cleaned up to use .ResolveStorePath()

We require krunkit 0.2.2, supporting --restul-uri=unix://.
Previously it was used only for vfkit, so we suggested to fallback to
the `nat` network. This advice is not relevant to krunkit or to qemu
(which can also use vmnet-helper).

Change the error to recommend installing vment-helper. We need to think
how we can recommend other networks for vfkit and qemu. Another solution
is to create error for every driver+network combination but this seems
hard to manage.
This is the same way that we test vfkit. This test is not running in the
CI.

Issues:
- Need to install and configure vment-helper (requires root).
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add krunkit driver
6 participants