Skip to content

Commit 36a8638

Browse files
committed
fix: improve examples and add real world uses
1 parent 441e40d commit 36a8638

File tree

1 file changed

+51
-2
lines changed

1 file changed

+51
-2
lines changed

hello-hf-kernels.md

Lines changed: 51 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,21 +8,42 @@ authors:
88
- user: pcuenca
99
- user: pagezyhf
1010
- user: merve
11+
- user: reach-vb
1112
date: 2025-03-28
1213
---
1314

14-
# 🏎️ Learn the Hugging Face Kernel Hub in 5 Minutes
15+
# 🏎️ Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub
1516

1617
**Boost your model performance with pre-optimized kernels, easily loaded from the Hub.**
1718

1819
Today, we'll explore an exciting development from Hugging Face: the **Kernel Hub**! As ML practitioners, we know that maximizing performance often involves diving deep into optimized code, custom CUDA kernels, or complex build systems. The Kernel Hub simplifies this process dramatically!
1920

20-
We'll cover the following topics:
21+
Below is a short example of how to use a kernel in your code.
22+
```python
23+
import torch
24+
25+
from kernels import get_kernel
26+
27+
# Download optimized kernels from the Hugging Face hub
28+
activation = get_kernel("kernels-community/activation")
29+
30+
# Random tensor
31+
x = torch.randn((10, 10), dtype=torch.float16, device="cuda")
32+
33+
# Run the kernel
34+
y = torch.empty_like(x)
35+
activation.gelu_fast(y, x)
36+
37+
print(y)
38+
```
39+
40+
In the next sections we'll cover the following topics:
2141

2242
1. **What is the Kernel Hub?** - Understanding the core concept.
2343
2. **How to use the Kernel Hub** - A quick code example.
2444
3. **Adding a Kernel to a Simple Model** - A practical integration using RMSNorm.
2545
4. **Reviewing Performance Impact** - Benchmarking the RMSNorm difference.
46+
5. **Real world use cases** - Examples of how the kernels library is being used in other projects.
2647

2748
We'll introduce these concepts quickly – the core idea can be grasped in about 5 minutes (though experimenting and benchmarking might take a bit longer!).
2849

@@ -35,6 +56,26 @@ Examples include advanced attention mechanisms (like [FlashAttention](https://hu
3556

3657
Instead of manually managing complex dependencies, wrestling with compilation flags, or building libraries like Triton or CUTLASS from source, you can use the `kernels` library to instantly fetch and run pre-compiled, optimized kernels.
3758

59+
For example, to enable **FlashAttention** you need just one line—no builds, no flags:
60+
61+
```python
62+
from kernels import get_kernel
63+
64+
flash_attention = get_kernel("kernels-community/flash-attn")
65+
```
66+
67+
`kernels` detects your exact Python, PyTorch, and CUDA versions, then downloads the matching pre‑compiled binary—typically in seconds (or a minute or two on a slow connection).
68+
69+
By contrast, compiling FlashAttention yourself requires:
70+
71+
* Cloning the repository and installing every dependency.
72+
* Configuring build flags and environment variables.
73+
* Reserving **\~96 GB of RAM** and plenty of CPU cores.
74+
* Waiting **10 minutes to several hours**, depending on your hardware.
75+
(See the project’s own [installation guide](https://github.com/Dao-AILab/flash-attention#installation) for details.)
76+
77+
Kernel Hub erases all that friction: one function call, instant acceleration.
78+
3879
### Benefits of the Kernel Hub:
3980

4081
* **Instant Access to Optimized Kernels**: Load and run kernels optimized for various hardware starting with NVIDIA and AMD GPUs, without local compilation hassles.
@@ -496,6 +537,14 @@ Actual results will depend on your hardware and the specific kernel implementati
496537
| 32768 | 37.079 | 19.9461 | 1.86x |
497538
| 65536 | 73.588 | 39.593 | 1.86x |
498539

540+
541+
## 5. Real World Use Cases
542+
543+
The `kernels` library is still growing but is already being used in various real work projects, including:
544+
- [Text Generation Inference](https://github.com/huggingface/text-generation-inference/blob/d658b5def3fe6c32b09b4ffe36f770ba2aa959b4/server/text_generation_server/layers/marlin/fp8.py#L15): The TGI project uses the `kernels` library to load optimized kernels for text generation tasks, improving performance and efficiency.
545+
- [Transformers](https://github.com/huggingface/transformers/blob/6f9da7649f2b23b22543424140ce2421fccff8af/src/transformers/integrations/hub_kernels.py#L32): The Transformers library has integrated the `kernels` library to use drop in optimized layers without requiring any changes to the model code. This allows users to easily switch between standard and optimized implementations.
546+
547+
499548
## Get Started and Next Steps!
500549

501550
You've seen how easy it is to fetch and use optimized kernels with the Hugging Face Kernel Hub. Ready to try it yourself?

0 commit comments

Comments
 (0)