Skip to content

[RL] Remove the w13 weight_scale and input_scale for UnquantizedEPMoE… #6308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 22, 2025

Conversation

zhuzilin
Copy link
Collaborator

@zhuzilin zhuzilin commented May 15, 2025

…Method

Motivation

When doing RL training, we may release all the parameters with /release_memory_occupation to free the memory occupied by the inference engine, which will also released all the input_scales and weight_scales.

Modifications

The origin w13_weight_scale in UnquantizedEPMoEMethod does not support reloading (as the shape should be (num_experts_per_partition, 2)). And I found that for the UnquantizedEPMoEMethod, we don't need to instantiate w13_weight_scale and w13_input_scale, so removing them could be a better solution than allocating twice the origin memory.

And note that we do need to reload the w2_input_scale, because if we set that to None, it will be initialized to torch.ones during EpMoE.forward_normal. So I need to change the condition in _load_fp8_scale to allow loading w2_input_scale from a random value to 1.

Thank you for your time on reviewing this PR :)

Checklist

@zhaochenyang20 zhaochenyang20 requested a review from BBuf as a code owner May 20, 2025 02:17
Copy link
Collaborator

@fzyzcjy fzyzcjy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, indeed making it none has one extra benefit: when doing EPLB shuffling, we no longer need to send these params between ranks

@zhyncs zhyncs merged commit e9feb48 into sgl-project:main May 22, 2025
0 of 21 checks passed
Layssy pushed a commit to Layssy/sglang-iaas that referenced this pull request Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants