Support async in DeepEP #4610

fzyzcjy · 2025-03-20T06:35:32Z

Motivation

When doing #4068, DeepEP needs to be async. This PR enables that in a minimal way.

(This is a separate PR because #4068 may not be done in a day, and I hope less merge conflicts happen, so extract this part first)

Related: #4232 (Initial DeepEP support)

When viewing diff, please subtract from change in #4608

In order to demonstrate the PR works, I set async_finish=True temporarily. In real world, maybe async will be true only when doing two-batch-overlap. The flag will be handled in #4068. (I can also make it false in this PR if needed)

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

…pseek_async

ch-wan · 2025-03-23T00:44:13Z

For future reference, I did a simple benchmark on one 8xH200 machine for this PR. Here is the command

python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --trust-remote-code   --tp 8 --dp 8 --host 0.0.0.0 --port 30000   --enable-dp-attention --enable-deepep-moe   --disable-cuda-graph
python3 -m sglang.bench_serving --backend sglang --dataset-name random --num-prompt 512 --random-input 1000 --random-output 1000 --random-range-ratio 1 --host 127.0.0.1 --port 30000 --max-concurrency 128

Results:

Version	Concurrency	Input	Output	Num Requests	Input Throughput(tok/s)	Output Throughput (tok/s)	Total Throughput (tok/s)
Oritinal	127.98	1000	1000	512	555.48	555.48	1110.97
Current	127.96	1000	1000	512	612.16	612.16	1224.33

python/sglang/srt/layers/moe/ep_moe/token_dispatcher.py

fzyzcjy added 25 commits March 20, 2025 09:54

more

3c4fe1c

more

05653aa

more

367a774

more

4be5dce

more

0e9e06c

more

86f7b04

fmt

7fc833e

more

c0752f4

hack

482babd

hack

119058b

more

91a58a2

more

70fb205

more

bcba532

more

22c2ac9

more

7923f75

more

525bd48

more

d5844c4

more

f69cb78

rm

0d33e40

cleanup

9670908

more

49166fc

fmt

888d7c8

more

31fa181

more

566f4d5

fmt

c64ccd0

fzyzcjy changed the title ~~Feat/deepseek async~~ Support async in DeepEP Mar 20, 2025

fzyzcjy and others added 2 commits March 20, 2025 14:37

Merge branch 'main' into feat/deepseek_remove_shape

93e9660

Merge branch 'feat/deepseek_remove_shape' into feat/deepseek_async

e700258

fzyzcjy marked this pull request as ready for review March 20, 2025 06:37

fzyzcjy requested a review from merrymercy as a code owner March 20, 2025 06:37

fzyzcjy requested review from Ying1123, zhyncs, hnyls2002, ispobock, ByronHsu and HaiShaw as code owners March 20, 2025 06:37

lint

337d4ef

fzyzcjy mentioned this pull request Mar 20, 2025

Refactor DeepSeek model by extracting basic functions #4611

Closed

6 tasks

rm debug

cb980a8

zhyncs added the high priority label Mar 20, 2025

fzyzcjy added 2 commits March 20, 2025 21:18

Merge branch 'main' into feat/deepseek_async

b9c1f96

Update token_dispatcher.py

a00dd8d

zhyncs assigned ch-wan Mar 22, 2025

Merge commit '3c09548d1fcb861359c3b8678805245820292f83' into feat/dee…

5bfa4de

…pseek_async

ch-wan reviewed Mar 23, 2025

View reviewed changes

python/sglang/srt/layers/moe/ep_moe/token_dispatcher.py Outdated Show resolved Hide resolved

python/sglang/srt/layers/moe/ep_moe/token_dispatcher.py Outdated Show resolved Hide resolved

python/sglang/srt/layers/moe/ep_moe/token_dispatcher.py Outdated Show resolved Hide resolved

fzyzcjy mentioned this pull request Mar 23, 2025

Multiple tiny code cleanups #4608

Merged

6 tasks

fzyzcjy and others added 2 commits March 23, 2025 09:33

exclude diff in 4608

aa718f4

minor

7a1f86e

ch-wan approved these changes Mar 23, 2025

View reviewed changes

Merge branch 'main' into feat/deepseek_async

6c9421d

zhyncs merged commit ca75741 into sgl-project:main Mar 23, 2025
0 of 18 checks passed

ch-wan mentioned this pull request Mar 24, 2025

[Roadmap] EP Enhancement #4734

Open

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support async in DeepEP #4610

Support async in DeepEP #4610

Uh oh!

fzyzcjy commented Mar 20, 2025 •

edited

Loading

Uh oh!

ch-wan commented Mar 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Support async in DeepEP #4610

Support async in DeepEP #4610

Uh oh!

Conversation

fzyzcjy commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

ch-wan commented Mar 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fzyzcjy commented Mar 20, 2025 •

edited

Loading