Tiny refactor DeepSeek V3/R1 NextN shared experts fusion #5143

lambert0312 · 2025-04-08T01:12:21Z

Motivation

Modifications

Extract the public method compute_shared_experts_fusion_weights and put it in deepseek_v2.py first.
Add necessary unit tests.

Acc in A800

python3 benchmark/gsm8k/bench_sglang.py --num-questions 200 --parallel 128 --num-shots 8 

Accuracy: 0.960
Invalid: 0.000
Latency: 14.804 s
Output throughput: 1451.247 token/s

Benchmark in A800

# qps 16
python3 -m sglang.bench_serving --backend sglang --num-prompts 200 --dataset-name random --max-concurrency 16 --random-input 256 --random-output 256 --seed 42

============ Serving Benchmark Result ============
Backend:                                 sglang
Traffic request rate:                    inf
Max reqeuest concurrency:                16
Successful requests:                     200
Benchmark duration (s):                  57.65
Total input tokens:                      26096
Total generated tokens:                  26874
Total generated tokens (retokenized):    26763
Request throughput (req/s):              3.47
Input token throughput (tok/s):          452.70
Output token throughput (tok/s):         466.20
Total token throughput (tok/s):          918.90
Concurrency:                             15.77
Accept length:                           2.60
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   4546.43
Median E2E Latency (ms):                 4602.09
---------------Time to First Token----------------
Mean TTFT (ms):                          207.83
Median TTFT (ms):                        174.89
P99 TTFT (ms):                           476.63
---------------Inter-Token Latency----------------
Mean ITL (ms):                           32.54
Median ITL (ms):                         19.18
P95 ITL (ms):                            90.16
P99 ITL (ms):                            168.08
Max ITL (ms):                            389.73
==================================================

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

xihuai18 · 2025-04-08T11:56:05Z

will fused shared experts still improve performance with nextn?

lambert0312 · 2025-04-08T15:07:53Z

will fused shared experts still improve performance with nextn?

Yes, I'm still experimenting with the current effects

python/sglang/srt/models/deepseek_nextn.py

merrymercy · 2025-04-21T01:06:45Z

Can you add a test case?

lambert0312 · 2025-04-21T08:28:17Z

Can you add a test case?

Ok. I will add it

fzyzcjy · 2025-04-21T09:07:51Z

Maybe my PR can be firstly merged to make the commit history a bit more clear

lambert0312 · 2025-04-21T09:27:58Z

Maybe my PR can be firstly merged to make the commit history a bit more clear

Yes, I'm waiting for it to be merged @fzyzcjy

xihuai18 · 2025-05-07T03:23:57Z

any update in this PR?

lambert0312 · 2025-05-07T23:57:08Z

any update in this PR?

No, can merge it in. @xihuai18

zhyncs · 2025-06-09T07:32:53Z

@BBuf @fzyzcjy

lambert0312 requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners April 8, 2025 01:12

lambert0312 mentioned this pull request Apr 8, 2025

Add DeepSeek V3/R1 shared experts fusion #4918

Merged

lambert0312 force-pushed the support_nextn_shared_experts_fusion branch from 23ede5b to 1e854ab Compare April 8, 2025 11:51

zhyncs assigned fzyzcjy and BBuf Apr 9, 2025

zhyncs added the high priority label Apr 9, 2025

fzyzcjy reviewed Apr 10, 2025

View reviewed changes

python/sglang/srt/models/deepseek_nextn.py Outdated Show resolved Hide resolved

python/sglang/srt/models/deepseek_nextn.py Outdated Show resolved Hide resolved

fzyzcjy mentioned this pull request Apr 10, 2025

Tiny refactor computation of shared expert fusion weights #5261

Open

6 tasks

deepseek v3/r1 nextn support shared experts fusion

5769b91

lambert0312 force-pushed the support_nextn_shared_experts_fusion branch from 668c67c to 5769b91 Compare April 21, 2025 09:40

lambert0312 added 2 commits April 21, 2025 02:43

lint

e63e6f9

add test

293944e

lambert0312 requested review from HandH1998, BBuf, yizhang2077 and yinfan98 as code owners April 21, 2025 11:58

lambert0312 added 3 commits April 22, 2025 20:40

Merge branch 'main' into support_nextn_shared_experts_fusion

ab080da

fix shared experts fusion error without quantization

e7d21f0

Merge branch 'main' into support_nextn_shared_experts_fusion

e30d08c

lambert0312 added 3 commits April 30, 2025 08:29

Merge branch 'main' into support_nextn_shared_experts_fusion

961b21d

Merge branch 'main' into support_nextn_shared_experts_fusion

1fb52ac

Merge branch 'main' into support_nextn_shared_experts_fusion

2f9f6b8

lambert0312 added 7 commits May 9, 2025 07:30

Merge branch 'main' into support_nextn_shared_experts_fusion

9c6d1b9

Merge branch 'main' into support_nextn_shared_experts_fusion

307d786

Merge branch 'main' into support_nextn_shared_experts_fusion

153af5f

Merge branch 'main' into support_nextn_shared_experts_fusion

3149e68

fix suffix_list error

4b48e88

Merge branch 'main' into support_nextn_shared_experts_fusion

ca0f071

Merge branch 'main' into support_nextn_shared_experts_fusion

d1091eb

lambert0312 requested a review from zhaochenyang20 as a code owner May 18, 2025 23:49

lambert0312 added 5 commits May 21, 2025 07:37

Merge branch 'main' into support_nextn_shared_experts_fusion

eeb621d

Merge branch 'main' into support_nextn_shared_experts_fusion

f744d11

Merge branch 'main' into support_nextn_shared_experts_fusion

9af4a90

Merge branch 'main' into support_nextn_shared_experts_fusion

b82b8c1

Merge branch 'main' into support_nextn_shared_experts_fusion

ace4f6f

lambert0312 temporarily deployed to prod June 9, 2025 07:32 — with GitHub Actions Inactive

lambert0312 added 2 commits June 9, 2025 02:38

update

9d2d743

update

1ee3b6a

lambert0312 force-pushed the support_nextn_shared_experts_fusion branch from 64e6df1 to 1ee3b6a Compare June 9, 2025 11:49

update test

6351425

lambert0312 force-pushed the support_nextn_shared_experts_fusion branch from 682653d to 6351425 Compare June 9, 2025 11:58

lambert0312 added 4 commits June 9, 2025 19:58

Merge branch 'main' into support_nextn_shared_experts_fusion

6c59219

Merge branch 'main' into support_nextn_shared_experts_fusion

cf1c34c

Merge branch 'main' into support_nextn_shared_experts_fusion

521ac5f

Merge branch 'main' into support_nextn_shared_experts_fusion

c79501b

lambert0312 closed this Jun 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tiny refactor DeepSeek V3/R1 NextN shared experts fusion #5143

Tiny refactor DeepSeek V3/R1 NextN shared experts fusion #5143

Uh oh!

lambert0312 commented Apr 8, 2025 •

edited

Loading

Uh oh!

xihuai18 commented Apr 8, 2025

Uh oh!

lambert0312 commented Apr 8, 2025

Uh oh!

Uh oh!

Uh oh!

merrymercy commented Apr 21, 2025

Uh oh!

lambert0312 commented Apr 21, 2025 •

edited

Loading

Uh oh!

fzyzcjy commented Apr 21, 2025

Uh oh!

lambert0312 commented Apr 21, 2025

Uh oh!

xihuai18 commented May 7, 2025

Uh oh!

lambert0312 commented May 7, 2025

Uh oh!

zhyncs commented Jun 9, 2025

Uh oh!

Uh oh!

Tiny refactor DeepSeek V3/R1 NextN shared experts fusion #5143

Tiny refactor DeepSeek V3/R1 NextN shared experts fusion #5143

Uh oh!

Conversation

lambert0312 commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Acc in A800

Benchmark in A800

Checklist

Uh oh!

xihuai18 commented Apr 8, 2025

Uh oh!

lambert0312 commented Apr 8, 2025

Uh oh!

Uh oh!

Uh oh!

merrymercy commented Apr 21, 2025

Uh oh!

lambert0312 commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fzyzcjy commented Apr 21, 2025

Uh oh!

lambert0312 commented Apr 21, 2025

Uh oh!

xihuai18 commented May 7, 2025

Uh oh!

lambert0312 commented May 7, 2025

Uh oh!

zhyncs commented Jun 9, 2025

Uh oh!

Uh oh!

lambert0312 commented Apr 8, 2025 •

edited

Loading

lambert0312 commented Apr 21, 2025 •

edited

Loading