[MLA] Simplification to batch P/D reordering #16673

njhill · 2025-04-15T16:22:59Z

I noticed that we're unnecessarily re-creating the sampling metadata twice when reordering the batch requests into prefill and decode groups for MLA.

This moves the reorder op from the start of the _prepare_inputs() method to the end of the _update_stats() method (which is called right before).

I noticed that we're unnecessarily re-creating the sampling metadata twice when reordering the batch requests into prefill and decode groups for MLA. This moves the reorder op from the start of the _prepare_inputs() method to the end of the _update_stats() method (which is called right before). Signed-off-by: Nick Hill <[email protected]>

github-actions · 2025-04-15T16:23:08Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

vllm/v1/worker/gpu_model_runner.py

Signed-off-by: Nick Hill <[email protected]>

LucasWilkinson

LGTM, do you mind just checking correctness?

VLLM_USE_V1=1 lm_eval --model vllm --model_args pretrained=deepseek-ai/DeepSeek-V2-Lite-Chat,tensor_parallel_size=2,dtype=auto,gpu_memory_utilization=0.9,trust_remote_code=True,max_model_len=16384,max_num_batched_tokens=1024 --task gsm8k --num_fewshot 5

njhill · 2025-04-16T16:11:42Z

@LucasWilkinson eval results for this PR:

vllm (pretrained=deepseek-ai/DeepSeek-V2-Lite-Chat,tensor_parallel_size=2,dtype=auto,gpu_memory_utilization=0.9,trust_remote_code=True,max_model_len=16384,max_num_batched_tokens=1024), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.6672|±  |0.0130|
|     |       |strict-match    |     5|exact_match|↑  |0.6581|±  |0.0131|

Results from current main:

Chat,tensor_parallel_size=2,dtype=auto,gpu_memory_utilization=0.9,trust_remote_code=True,max_model_len=16384,max_num_batched_tokens=1024), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.6603|±  |0.0130|
|     |       |strict-match    |     5|exact_match|↑  |0.6535|±  |0.0131|

LucasWilkinson

LGTM, thanks for the accuracy checks!

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Yang Wang <[email protected]>

Signed-off-by: Nick Hill <[email protected]>

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Agata Dobrzyniewicz <[email protected]>

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Mu Huai <[email protected]>

njhill requested a review from LucasWilkinson April 15, 2025 16:22

njhill requested review from WoosukKwon, robertgshaw2-redhat, ywang96, comaniac and alexm-redhat as code owners April 15, 2025 16:23

mergify bot added the v1 label Apr 15, 2025

njhill commented Apr 15, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

LucasWilkinson reviewed Apr 15, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

Always call reorder

db9f9a6

Signed-off-by: Nick Hill <[email protected]>

LucasWilkinson reviewed Apr 15, 2025

View reviewed changes

mgoin mentioned this pull request Apr 16, 2025

[V1] V1 FlashInfer Attention #16684

Merged

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 16, 2025

LucasWilkinson approved these changes Apr 17, 2025

View reviewed changes

LucasWilkinson merged commit 0377b83 into vllm-project:main Apr 17, 2025
58 checks passed

njhill deleted the combine-reorder branch April 17, 2025 20:14

yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Apr 21, 2025

[MLA] Simplification to batch P/D reordering (vllm-project#16673)

76b3420

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Yang Wang <[email protected]>

jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025

[MLA] Simplification to batch P/D reordering (vllm-project#16673)

4950f71

Signed-off-by: Nick Hill <[email protected]>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[MLA] Simplification to batch P/D reordering (vllm-project#16673)

fbb2b84

Signed-off-by: Nick Hill <[email protected]>

adobrzyn pushed a commit to HabanaAI/vllm-fork that referenced this pull request Apr 30, 2025

[MLA] Simplification to batch P/D reordering (vllm-project#16673)

552963f

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Agata Dobrzyniewicz <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[MLA] Simplification to batch P/D reordering (vllm-project#16673)

c27f2a4

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Mu Huai <[email protected]>

ckhordiasma mentioned this pull request May 14, 2025

nm vllm ent 0.8.5 sync red-hat-data-services/vllm#139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MLA] Simplification to batch P/D reordering #16673

[MLA] Simplification to batch P/D reordering #16673

Uh oh!

njhill commented Apr 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

LucasWilkinson left a comment

Uh oh!

njhill commented Apr 16, 2025 •

edited

Loading

Uh oh!

LucasWilkinson left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[MLA] Simplification to batch P/D reordering #16673

[MLA] Simplification to batch P/D reordering #16673

Uh oh!

Conversation

njhill commented Apr 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

njhill commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

njhill commented Apr 15, 2025 •

edited by github-actions bot

Loading

njhill commented Apr 16, 2025 •

edited

Loading