Skip to content

Tiny refactor computation of shared expert fusion weights #5261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

fzyzcjy
Copy link
Collaborator

@fzyzcjy fzyzcjy commented Apr 10, 2025

Motivation

This PR depends on #5188

It can be useful in #5143 (review) and #5101

python -m sglang.launch_server    --model-path deepseek-ai/DeepSeek-V3-0324 --trust-remote-code    --tp 16 --disable-radix-cache    --host 0.0.0.0 --port 20000 --dist-init-addr 10.10.37.16:37000 --nnodes 2 --node-rank 0

(cd /host_home/primary_synced/sglang && python3 benchmark/gsm8k/bench_sglang.py --port 20000 --parallel 1400 --num-questions 1400)

gsm8k: 93.8

Modifications

Checklist

@fzyzcjy
Copy link
Collaborator Author

fzyzcjy commented Apr 21, 2025

Ping me when this PR is ready for merge / needs modification, then I will resolve conflicts

@lambert0312
Copy link
Contributor

Ping me when this PR is ready for merge / needs modification, then I will resolve conflicts

@fzyzcjy Could you please remove the changes in deepseek_v2.py in this PR? I have resolved the conflicts and updated this part of the changes in #5143, and also attached the changes of nextn, thank you!
This PR is to extract the logic of repeated shared expert fusion weights calculations.

@fzyzcjy
Copy link
Collaborator Author

fzyzcjy commented Apr 22, 2025

Could you please remove the changes in deepseek_v2.py in this PR?

Hmm, to double check: do you mean making deepseek_v2.py have no diff, and only add a new function (that is not called yet) in the weight_utils?

@lambert0312
Copy link
Contributor

Hmm, to double check: do you mean making deepseek_v2.py have no diff, and only add a new function (that is not called yet) in the weight_utils?

yes

@fzyzcjy
Copy link
Collaborator Author

fzyzcjy commented Apr 22, 2025

OK I will do that when this PR is going to be merged

@lambert0312
Copy link
Contributor

I think this PR can be merged. Please complete the modification as soon as possible. Thank you!

@fzyzcjy
Copy link
Collaborator Author

fzyzcjy commented Apr 27, 2025

Yes I think so, I will have time tomorrow for it hopefully

@lambert0312
Copy link
Contributor

Yes I think so, I will have time tomorrow for it hopefully

A simpler MTP loading method has been refactored, and this PR does not seem to be needed @fzyzcjy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants