Support fine-grained control of requests that are run together #4699

fzyzcjy · 2025-03-23T12:17:51Z

Motivation

Currently, when submitting requests (e.g. engine.generate or HTTP call), we have no control which requests will be run together in a single batch and which will not, partially because the intrinsic undeterminism of IPC. However, in some scenarios, it would be great if we have more control. For example:

Benchmarking and profiling (e.g. we want to know the behavior when exactly having "1024 token x 8 req per GPU"; this is the primary reason why I made this PR)
Testing (e.g. in two-batch-overlap, we may want to test when "one card has 2 req while another card has 1 req, it should be disabled")

Thus this PR adds this feature. Since it is only used for benchmarking or testing, the code is not efficient (e.g. it calls torch.distributed that may be reducible to some extent), and may have rough edges.

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

This reverts commit 12bc2ef.

fzyzcjy · 2025-04-01T03:56:53Z

Ping me when this PR is to be merged - currently I only resolve conflicts in #4068, and will port the resolve code back here when pinged.

fzyzcjy added 29 commits March 23, 2025 18:05

cherry pick

12bc2ef

more

3dbf433

Revert "cherry pick"

8e7fd00

This reverts commit 12bc2ef.

more

25763ed

more

c4e2562

more

844b3be

more

cf1d1dc

more

2bea87e

more

49e1a7d

more

3238851

more

768ef73

more

08f0936

more

df77161

more

9de9ac0

more

a9ccda9

more

955a45f

more

197575e

more

2f29709

more

44bce35

more

904f6b3

more

83345d7

more

6919496

more

e751184

more

5a8a8e2

more

7daf004

more

47cba32

fmt

b8ec9cc

more

73eee1f

more

55f51a2

fzyzcjy requested a review from merrymercy as a code owner March 23, 2025 12:17

fzyzcjy requested review from Ying1123, hnyls2002, xiezhq-hermann, zhyncs, ispobock and ByronHsu as code owners March 23, 2025 12:17

more

1e4b2e2

fzyzcjy mentioned this pull request Mar 23, 2025

Allow benchmarking each forward pass in e2e systems #4666

Open

6 tasks

fzyzcjy and others added 2 commits April 10, 2025 21:37

Merge branch 'main' into feat/colocate_batch_gen

f66c045

more

6f3fd42

fzyzcjy mentioned this pull request Apr 11, 2025

EPLB #5295

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support fine-grained control of requests that are run together #4699

Support fine-grained control of requests that are run together #4699

Uh oh!

fzyzcjy commented Mar 23, 2025 •

edited

Loading

Uh oh!

fzyzcjy commented Apr 1, 2025

Uh oh!

Uh oh!

Support fine-grained control of requests that are run together #4699

Are you sure you want to change the base?

Support fine-grained control of requests that are run together #4699

Uh oh!

Conversation

fzyzcjy commented Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

fzyzcjy commented Apr 1, 2025

Uh oh!

Uh oh!

fzyzcjy commented Mar 23, 2025 •

edited

Loading