-
Notifications
You must be signed in to change notification settings - Fork 2.3k
linear support deepgemm #4199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linear support deepgemm #4199
Conversation
wait for #4255 |
if _is_cuda: | ||
deep_gemm.gemm_fp8_fp8_bf16_nt((A, As), (B, Bs), C) | ||
else: | ||
kernel = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a duplicate. Please delete line 726 of the code.
402c4e1
to
c35f45f
Compare
v0.0.4.post1 is ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ut can't pass on H200
- torch compile not compatible
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1 --tp 8 --trust-remote-code --enable-torch-compile --torch-compile-max-bs 1
ref #3232 |
e7b3619
to
1953d05
Compare
wait this deepseek-ai/DeepGEMM#55 |
There are still many issues to be fixed in this PR, such as the condition of arch older than hopper, oom of ut, etc., but these are all minor issues that can be resolved in follow-up. |
Co-authored-by: yinfan98 <[email protected]>
Motivation
Use deepgemm linear gemm, dependent on #4165
Test case
MMLU Average accuracy: 0.878
performence test