-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Tiny refactor DeepSeek V3/R1 NextN shared experts fusion #5143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tiny refactor DeepSeek V3/R1 NextN shared experts fusion #5143
Conversation
23ede5b
to
1e854ab
Compare
will fused shared experts still improve performance with nextn? |
Yes, I'm still experimenting with the current effects |
Can you add a test case? |
Ok. I will add it |
Maybe my PR can be firstly merged to make the commit history a bit more clear |
Yes, I'm waiting for it to be merged @fzyzcjy |
668c67c
to
5769b91
Compare
any update in this PR? |
No, can merge it in. @xihuai18 |
64e6df1
to
1ee3b6a
Compare
682653d
to
6351425
Compare
Motivation
Ref #4918
Ref #5707
Ref #5793
Modifications
compute_shared_experts_fusion_weights
and put it indeepseek_v2.py
first.Acc in A800
Benchmark in A800
Checklist