linear support deepgemm #4199

sleepcoo · 2025-03-08T07:36:02Z

Motivation

Use deepgemm linear gemm, dependent on #4165

Test case

#luanch server
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code

MMLU Average accuracy: 0.878

performence test

size	origin	deepgemm linear
Output token throughput (tok/s)	1244.04	1360.05
Total token throughput (tok/s)	1818.26	1987.82

zhyncs · 2025-03-10T07:41:33Z

wait for #4255

zhyncs · 2025-03-10T07:43:59Z

python/sglang/srt/layers/quantization/fp8_kernel.py

+    if _is_cuda:
+        deep_gemm.gemm_fp8_fp8_bf16_nt((A, As), (B, Bs), C)
+    else:
+        kernel = (


This is a duplicate. Please delete line 726 of the code.

python/sglang/srt/layers/quantization/fp8_kernel.py

merrymercy · 2025-03-10T10:40:36Z

I tried to release a new version of sgl kernel to unblock this. But I hit some errors 1a5023e. Will need @zhyncs 's help.

zhyncs · 2025-03-10T18:33:02Z

v0.0.4.post1 is ready

zhyncs

ut can't pass on H200
torch compile not compatible python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1 --tp 8 --trust-remote-code --enable-torch-compile --torch-compile-max-bs 1

zhyncs · 2025-03-10T21:43:36Z

ref #3232

sleepcoo · 2025-03-11T04:45:07Z

ut can't pass on H200

torch compile not compatible python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1 --tp 8 --trust-remote-code --enable-torch-compile --torch-compile-max-bs 1

wait this deepseek-ai/DeepGEMM#55

zhyncs · 2025-03-11T07:36:41Z

There are still many issues to be fixed in this PR, such as the condition of arch older than hopper, oom of ut, etc., but these are all minor issues that can be resolved in follow-up.

Co-authored-by: yinfan98 <[email protected]>

sleepcoo and others added 3 commits March 8, 2025 15:34

linear support deepgemm

b3a13be

Merge branch 'main' into deepgemm-linear

f786731

Using the deepgemm kernel with CUDA.

d6e374d

zhyncs added the high priority label Mar 9, 2025

zhyncs marked this pull request as ready for review March 10, 2025 07:43

Merge branch 'main' into deepgemm-linear

01dcbdc

zhyncs requested review from merrymercy, Ying1123, zhyncs, ispobock and HaiShaw as code owners March 10, 2025 07:43

zhyncs reviewed Mar 10, 2025

View reviewed changes

fix useless line

8034344

merrymercy reviewed Mar 10, 2025

View reviewed changes

python/sglang/srt/layers/quantization/fp8_kernel.py Outdated Show resolved Hide resolved

fix unit test

c35f45f

sleepcoo force-pushed the deepgemm-linear branch from 402c4e1 to c35f45f Compare March 10, 2025 10:36

Merge branch 'main' into deepgemm-linear

cb33a05

sleepcoo mentioned this pull request Mar 10, 2025

DeepGemm integrate to sgl-kernel #4165

Merged

6 tasks

Merge branch 'main' into deepgemm-linear

a9239ae

zhyncs requested changes Mar 10, 2025

View reviewed changes

sleepcoo and others added 2 commits March 11, 2025 09:23

Merge branch 'main' into deepgemm-linear

de455ad

fix deepgemm linear output type

1953d05

sleepcoo force-pushed the deepgemm-linear branch from e7b3619 to 1953d05 Compare March 11, 2025 04:19

sleepcoo closed this Mar 11, 2025

sleepcoo reopened this Mar 11, 2025

sleepcoo and others added 3 commits March 11, 2025 04:46

fix bug

acbfba9

Add environment variable to control enabling deepgemm.

6a38533

Merge branch 'main' into deepgemm-linear

ac8aa6f

zhyncs approved these changes Mar 11, 2025

View reviewed changes

zhyncs merged commit dce303e into sgl-project:main Mar 11, 2025
3 of 19 checks passed

merrymercy mentioned this pull request Mar 13, 2025

Development Roadmap (2025 H1) #4042

Open

67 tasks

hebiao064 pushed a commit to hebiao064/sglang that referenced this pull request Mar 13, 2025

linear support deepgemm (sgl-project#4199)

f45b90a

Co-authored-by: yinfan98 <[email protected]>

merrymercy mentioned this pull request Feb 24, 2025

[Feature] DeepSeek V3 optimization #2591

Closed

20 tasks

laixinn mentioned this pull request Mar 24, 2025

add deepgemm into sgl-kernel #3957

Closed

6 tasks

zhyncs mentioned this pull request Mar 25, 2025

[Feature] speedup DeepGEMM JIT compilation #4773

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

linear support deepgemm #4199

linear support deepgemm #4199

Uh oh!

sleepcoo commented Mar 8, 2025 •

edited

Loading

Uh oh!

zhyncs commented Mar 10, 2025

Uh oh!

zhyncs Mar 10, 2025

Uh oh!

Uh oh!

merrymercy commented Mar 10, 2025

Uh oh!

zhyncs commented Mar 10, 2025

Uh oh!

zhyncs left a comment

Uh oh!

zhyncs commented Mar 10, 2025

Uh oh!

sleepcoo commented Mar 11, 2025

Uh oh!

zhyncs commented Mar 11, 2025

Uh oh!

Uh oh!

Uh oh!

linear support deepgemm #4199

linear support deepgemm #4199

Uh oh!

Conversation

sleepcoo commented Mar 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Test case

performence test

Uh oh!

zhyncs commented Mar 10, 2025

Uh oh!

zhyncs Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

merrymercy commented Mar 10, 2025

Uh oh!

zhyncs commented Mar 10, 2025

Uh oh!

zhyncs left a comment

Choose a reason for hiding this comment

Uh oh!

zhyncs commented Mar 10, 2025

Uh oh!

sleepcoo commented Mar 11, 2025

Uh oh!

zhyncs commented Mar 11, 2025

Uh oh!

Uh oh!

Uh oh!

sleepcoo commented Mar 8, 2025 •

edited

Loading