Skip to content

linear support deepgemm #4199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Mar 11, 2025
Merged

linear support deepgemm #4199

merged 13 commits into from
Mar 11, 2025

Conversation

sleepcoo
Copy link
Collaborator

@sleepcoo sleepcoo commented Mar 8, 2025

Motivation

Use deepgemm linear gemm, dependent on #4165

Test case

#luanch server
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code 

MMLU Average accuracy: 0.878

performence test

size  origin  deepgemm linear
Output token throughput (tok/s) 1244.04  1360.05
Total token throughput (tok/s) 1818.26  1987.82

@zhyncs
Copy link
Member

zhyncs commented Mar 10, 2025

wait for #4255

@zhyncs zhyncs marked this pull request as ready for review March 10, 2025 07:43
if _is_cuda:
deep_gemm.gemm_fp8_fp8_bf16_nt((A, As), (B, Bs), C)
else:
kernel = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a duplicate. Please delete line 726 of the code.

@merrymercy
Copy link
Contributor

I tried to release a new version of sgl kernel to unblock this. But I hit some errors 1a5023e. Will need @zhyncs 's help.

@sleepcoo sleepcoo mentioned this pull request Mar 10, 2025
6 tasks
@zhyncs
Copy link
Member

zhyncs commented Mar 10, 2025

v0.0.4.post1 is ready

Copy link
Member

@zhyncs zhyncs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ut can't pass on H200
  • torch compile not compatible python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1 --tp 8 --trust-remote-code --enable-torch-compile --torch-compile-max-bs 1

@zhyncs
Copy link
Member

zhyncs commented Mar 10, 2025

ref #3232

@sleepcoo sleepcoo closed this Mar 11, 2025
@sleepcoo sleepcoo reopened this Mar 11, 2025
@sleepcoo
Copy link
Collaborator Author

  • ut can't pass on H200
  • torch compile not compatible python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1 --tp 8 --trust-remote-code --enable-torch-compile --torch-compile-max-bs 1

wait this deepseek-ai/DeepGEMM#55

@zhyncs
Copy link
Member

zhyncs commented Mar 11, 2025

There are still many issues to be fixed in this PR, such as the condition of arch older than hopper, oom of ut, etc., but these are all minor issues that can be resolved in follow-up.

@zhyncs zhyncs merged commit dce303e into sgl-project:main Mar 11, 2025
3 of 19 checks passed
@merrymercy merrymercy mentioned this pull request Mar 13, 2025
67 tasks
hebiao064 pushed a commit to hebiao064/sglang that referenced this pull request Mar 13, 2025
@merrymercy merrymercy mentioned this pull request Feb 24, 2025
20 tasks
@laixinn laixinn mentioned this pull request Mar 24, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants