Add fp8 qkv_proj_with_rope kernel for CPU in sgl-kernel and add UT #6493

blzheng · 2025-05-21T08:18:02Z

Motivation

This PR is a follow-up on #2807 and #5150 to add fp8 qkv_proj_with_rope kernel for CPU. The bf16 and int8 fused_experts kernel is already added in #5150.

This PR also adds UTs for bf16, int8 and fp8 qkv_proj_with_rope kernels for CPU.
This PR also addresses the issues with the definition and usage of decode_attention_cpu and extend_attention_cpu on the main branch.

Modifications

The main change is the C++ kernels for fp8 qkv_proj_with_rope on CPU: sgl-kernel/csrc/cpu/qkv_proj.cpp
The UTs for qkv_proj_with_rope OPs on CPU: test/srt/cpu/test_qkv_proj_with_rope.py

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

mingfeima

just some minor issues to address.

test/srt/cpu/test_qkv_proj_with_rope.py

mingfeima · 2025-05-23T02:45:59Z

test/srt/cpu/test_qkv_proj_with_rope.py

+def _rotate_gptj(x: torch.Tensor) -> torch.Tensor:
+    x1 = x[..., ::2]
+    x2 = x[..., 1::2]
+    x = torch.stack((-x2, x1), dim=-1)
+    return x.flatten(-2)
+
+
+def rotary_emb(q_pe, k_pe, pos, cos_sin_cache):
+    orig_dtype = q_pe.dtype
+    q_pe = q_pe.float()
+    k_pe = k_pe.float()
+    cos_sin_cache = cos_sin_cache.float()
+
+    query_rot = q_pe[..., :rotary_dim]
+    key_rot = k_pe[..., :rotary_dim]
+    cos_sin = cos_sin_cache[pos]
+    cos, sin = cos_sin.chunk(2, dim=-1)
+    cos = cos.repeat_interleave(2, dim=-1).unsqueeze(-2)
+    sin = sin.repeat_interleave(2, dim=-1).unsqueeze(-2)
+    query_rot = query_rot * cos + _rotate_gptj(query_rot) * sin
+    key_rot = key_rot * cos + _rotate_gptj(key_rot) * sin
+    return query_rot.to(orig_dtype), key_rot.to(orig_dtype)


use reference from https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/layers/rotary_embedding.py

test/srt/cpu/test_qkv_proj_with_rope.py

mingfeima · 2025-05-23T02:47:40Z

test/srt/cpu/test_qkv_proj_with_rope.py

+        kva_packed = torch.ops.sgl_kernel.convert_weight_packed(kv_a_proj_weight)
+        wkc_packed = torch.ops.sgl_kernel.convert_weight_packed(w_kc)
+
+        q_out, k_out, v_out = torch.ops.sgl_kernel.qkv_proj_with_rope(


you can import everything you need at the beggining.
from torch.ops.sgl_kernel import xxx, yyy, zzz

Custom registered ops cannot be accessed through this kind of import statement. I manually wrapped convert_weight_packed and qkv_proj_with_rope as local variables at the beginning of the file to simplify their usage in the unit tests.

…gl-project#6493)

blzheng added 4 commits May 19, 2025 01:32

[CPU] Fix cpu build issue

8255c52

Merge branch 'main' into beilei/cpu_build_fix

86e8e79

remove pybind

4025998

Add fp8 qkv_proj_with_rope for CPU in sgl-kernel andd add UT

5e49773

blzheng changed the title ~~Add fp8 qkv_proj_with_rope kernel for CPU in sgl-kernel andd add UT~~ Add fp8 qkv_proj_with_rope kernel for CPU in sgl-kernel and add UT May 21, 2025

blzheng added 2 commits May 21, 2025 20:10

Merge branch 'main' into beilei/fp8_qkvproj

192876c

Merge branch 'main' into beilei/fp8_qkvproj

88244a8

blzheng marked this pull request as ready for review May 22, 2025 03:16

blzheng requested review from zhyncs, ispobock, HandH1998, BBuf, yizhang2077, merrymercy, yinfan98 and Ying1123 as code owners May 22, 2025 03:16

mingfeima added sgl-kernel intel cpu cpu backend performance optimization labels May 22, 2025

mingfeima marked this pull request as draft May 22, 2025 03:21

fix lint

7fca26f

blzheng force-pushed the beilei/fp8_qkvproj branch from c123aa7 to 7fca26f Compare May 22, 2025 03:33

Merge branch 'main' into beilei/fp8_qkvproj

e8f2122

blzheng marked this pull request as ready for review May 22, 2025 09:11

mingfeima requested changes May 23, 2025

View reviewed changes

minor fix

a575190

blzheng requested a review from mingfeima May 23, 2025 04:18

Merge branch 'main' into beilei/fp8_qkvproj

863e78b

mingfeima approved these changes May 23, 2025

View reviewed changes

mingfeima mentioned this pull request May 23, 2025

Add fp8 fused_experts kernel for CPU in sgl-kernel and add UT #6404

Merged

zhyncs approved these changes May 23, 2025

View reviewed changes

zhyncs merged commit 4ba1eea into sgl-project:main May 23, 2025
29 of 47 checks passed

Layssy pushed a commit to Layssy/sglang-iaas that referenced this pull request Jun 9, 2025

Add fp8 qkv_proj_with_rope kernel for CPU in sgl-kernel and add UT (s…

5cfa3c0

…gl-project#6493)

xwu-intel pushed a commit to xwu-intel/sglang that referenced this pull request Jun 17, 2025

Add fp8 qkv_proj_with_rope kernel for CPU in sgl-kernel and add UT (s…

9680c86

…gl-project#6493)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add fp8 qkv_proj_with_rope kernel for CPU in sgl-kernel and add UT #6493

Add fp8 qkv_proj_with_rope kernel for CPU in sgl-kernel and add UT #6493

Uh oh!

blzheng commented May 21, 2025 •

edited

Loading

Uh oh!

mingfeima left a comment

Uh oh!

Uh oh!

mingfeima May 23, 2025

Uh oh!

blzheng May 23, 2025

Uh oh!

Uh oh!

mingfeima May 23, 2025

Uh oh!

blzheng May 23, 2025

Uh oh!

Uh oh!

Uh oh!

Add fp8 qkv_proj_with_rope kernel for CPU in sgl-kernel and add UT #6493

Add fp8 qkv_proj_with_rope kernel for CPU in sgl-kernel and add UT #6493

Uh oh!

Conversation

blzheng commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

mingfeima left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mingfeima May 23, 2025

Choose a reason for hiding this comment

Uh oh!

blzheng May 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mingfeima May 23, 2025

Choose a reason for hiding this comment

Uh oh!

blzheng May 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

blzheng commented May 21, 2025 •

edited

Loading