You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: hello-hf-kernels.md
+51-2Lines changed: 51 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -8,21 +8,42 @@ authors:
8
8
- user: pcuenca
9
9
- user: pagezyhf
10
10
- user: merve
11
+
- user: reach-vb
11
12
date: 2025-03-28
12
13
---
13
14
14
-
# 🏎️ Learn the Hugging Face Kernel Hub in 5 Minutes
15
+
# 🏎️ Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub
15
16
16
17
**Boost your model performance with pre-optimized kernels, easily loaded from the Hub.**
17
18
18
19
Today, we'll explore an exciting development from Hugging Face: the **Kernel Hub**! As ML practitioners, we know that maximizing performance often involves diving deep into optimized code, custom CUDA kernels, or complex build systems. The Kernel Hub simplifies this process dramatically!
19
20
20
-
We'll cover the following topics:
21
+
Below is a short example of how to use a kernel in your code.
22
+
```python
23
+
import torch
24
+
25
+
from kernels import get_kernel
26
+
27
+
# Download optimized kernels from the Hugging Face hub
x = torch.randn((10, 10), dtype=torch.float16, device="cuda")
32
+
33
+
# Run the kernel
34
+
y = torch.empty_like(x)
35
+
activation.gelu_fast(y, x)
36
+
37
+
print(y)
38
+
```
39
+
40
+
In the next sections we'll cover the following topics:
21
41
22
42
1.**What is the Kernel Hub?** - Understanding the core concept.
23
43
2.**How to use the Kernel Hub** - A quick code example.
24
44
3.**Adding a Kernel to a Simple Model** - A practical integration using RMSNorm.
25
45
4.**Reviewing Performance Impact** - Benchmarking the RMSNorm difference.
46
+
5.**Real world use cases** - Examples of how the kernels library is being used in other projects.
26
47
27
48
We'll introduce these concepts quickly – the core idea can be grasped in about 5 minutes (though experimenting and benchmarking might take a bit longer!).
Instead of manually managing complex dependencies, wrestling with compilation flags, or building libraries like Triton or CUTLASS from source, you can use the `kernels` library to instantly fetch and run pre-compiled, optimized kernels.
37
58
59
+
For example, to enable **FlashAttention** you need just one line—no builds, no flags:
`kernels` detects your exact Python, PyTorch, and CUDA versions, then downloads the matching pre‑compiled binary—typically in seconds (or a minute or two on a slow connection).
68
+
69
+
By contrast, compiling FlashAttention yourself requires:
70
+
71
+
* Cloning the repository and installing every dependency.
72
+
* Configuring build flags and environment variables.
73
+
* Reserving **\~96 GB of RAM** and plenty of CPU cores.
74
+
* Waiting **10 minutes to several hours**, depending on your hardware.
75
+
(See the project’s own [installation guide](https://github.com/Dao-AILab/flash-attention#installation) for details.)
76
+
77
+
Kernel Hub erases all that friction: one function call, instant acceleration.
78
+
38
79
### Benefits of the Kernel Hub:
39
80
40
81
***Instant Access to Optimized Kernels**: Load and run kernels optimized for various hardware starting with NVIDIA and AMD GPUs, without local compilation hassles.
@@ -496,6 +537,14 @@ Actual results will depend on your hardware and the specific kernel implementati
496
537
| 32768 | 37.079 | 19.9461 | 1.86x |
497
538
| 65536 | 73.588 | 39.593 | 1.86x |
498
539
540
+
541
+
## 5. Real World Use Cases
542
+
543
+
The `kernels` library is still growing but is already being used in various real work projects, including:
544
+
-[Text Generation Inference](https://github.com/huggingface/text-generation-inference/blob/d658b5def3fe6c32b09b4ffe36f770ba2aa959b4/server/text_generation_server/layers/marlin/fp8.py#L15): The TGI project uses the `kernels` library to load optimized kernels for text generation tasks, improving performance and efficiency.
545
+
-[Transformers](https://github.com/huggingface/transformers/blob/6f9da7649f2b23b22543424140ce2421fccff8af/src/transformers/integrations/hub_kernels.py#L32): The Transformers library has integrated the `kernels` library to use drop in optimized layers without requiring any changes to the model code. This allows users to easily switch between standard and optimized implementations.
546
+
547
+
499
548
## Get Started and Next Steps!
500
549
501
550
You've seen how easy it is to fetch and use optimized kernels with the Hugging Face Kernel Hub. Ready to try it yourself?
0 commit comments