-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Tiny refactor computation of shared expert fusion weights #5261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Tiny refactor computation of shared expert fusion weights #5261
Conversation
Ping me when this PR is ready for merge / needs modification, then I will resolve conflicts |
@fzyzcjy Could you please remove the changes in |
Hmm, to double check: do you mean making deepseek_v2.py have no diff, and only add a new function (that is not called yet) in the weight_utils? |
yes |
OK I will do that when this PR is going to be merged |
I think this PR can be merged. Please complete the modification as soon as possible. Thank you! |
Yes I think so, I will have time tomorrow for it hopefully |
A simpler MTP loading method has been refactored, and this PR does not seem to be needed @fzyzcjy |
Motivation
This PR depends on #5188
It can be useful in #5143 (review) and #5101
gsm8k: 93.8
Modifications
Checklist