Skip to content

[fix] logical_to_all_physical_map index 256 is out of bounds in EP parallel. #6767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jun 7, 2025
Merged
18 changes: 15 additions & 3 deletions python/sglang/srt/models/deepseek_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -1714,21 +1714,33 @@ def determine_num_fused_shared_experts(
or self.config.n_routed_experts != 256
):
self.num_fused_shared_experts = 0
global_server_args_dict["disable_shared_experts_fusion"] = 1
global_server_args_dict["disable_shared_experts_fusion"] = True
log_info_on_rank0(
logger,
"Only Deepseek V3/R1 on NV-platform can use shared experts fusion optimization. Shared experts fusion optimization is disabled.",
)
elif (global_server_args_dict["enable_deepep_moe"] or global_server_args_dict["enable_ep_moe"]):
self.num_fused_shared_experts = 0
global_server_args_dict["disable_shared_experts_fusion"] = True
log_info_on_rank0(
logger,
"Deepseek V3/R1 can not use shared experts fusion optimization when in deepep_moe or ep_moe mode. Shared experts fusion optimization is disabled.",
)
elif self.num_fused_shared_experts == 0:
if (
_is_cuda
and torch.cuda.get_device_capability("cuda") >= (9, 0)
and self.config.architectures[0] == architecture
and self.config.n_routed_experts == 256
and (not global_server_args_dict["enable_deepep_moe"])
and (
not (
global_server_args_dict["enable_deepep_moe"]
or global_server_args_dict["enable_ep_moe"]
)
)
):
self.num_fused_shared_experts = self.config.n_shared_experts
global_server_args_dict["disable_shared_experts_fusion"] = 0
global_server_args_dict["disable_shared_experts_fusion"] = False
log_info_on_rank0(
logger,
"Deepseek V3/R1 with fp8 can use shared experts fusion optimization when SM version >=90. Shared experts fusion optimization is enabled.",
Expand Down
Loading