[BugFix] Fix incremental detokenization perf issue #16963

njhill · 2025-04-22T04:55:28Z

max was meant to be min - could cause O(n^2) blowup in pathological cases

max was meant to be min - could cause O(n^2) blowup in pathological cases Signed-off-by: Nick Hill <[email protected]>

github-actions · 2025-04-22T04:55:36Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

WoosukKwon · 2025-04-22T05:34:37Z

vllm/v1/engine/detokenizer.py

@@ -161,7 +161,7 @@ def __init__(self, tokenizer: PreTrainedTokenizerFast,
        prompt_suffix = request.prompt_token_ids
        prompt_len = len(prompt_suffix)
        if prompt_len > 4:
-            for i in range(4, max(prompt_len + 1, 32)):
+            for i in range(4, min(prompt_len + 1, 24)):


Where does 24 come from? Can we use smaller numbers like 5?

It's to try to find a small suffix to start from, or else it has to loop over the entire prompt below decoding the tokens one-by-one, which could be very long.

So, does it mean we can't use smaller numbers?

WoosukKwon

LGTM. Thanks for the fix! The performance looks normal after this PR:

============ Serving Benchmark Result ============
Successful requests:                     100       
Benchmark duration (s):                  45.27     
Total input tokens:                      1000000   
Total generated tokens:                  20000     
Request throughput (req/s):              2.21      
Output token throughput (tok/s):         441.79    
Total Token throughput (tok/s):          22531.15  
---------------Time to First Token----------------
Mean TTFT (ms):                          974.65    
Median TTFT (ms):                        413.66    
P99 TTFT (ms):                           2323.33   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          66.89     
Median TPOT (ms):                        77.80     
P99 TPOT (ms):                           83.48     
---------------Inter-token Latency----------------
Mean ITL (ms):                           66.89     
Median ITL (ms):                         27.12     
P99 ITL (ms):                            246.34    
==================================================

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Frieda (Jingying) Huang <[email protected]>

Signed-off-by: Nick Hill <[email protected]>

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Agata Dobrzyniewicz <[email protected]>

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: minpeter <[email protected]>

[BugFix] Fix incremental detokenization perf issue

ce85364

max was meant to be min - could cause O(n^2) blowup in pathological cases Signed-off-by: Nick Hill <[email protected]>

njhill added the bug Something isn't working label Apr 22, 2025

njhill requested review from WoosukKwon, robertgshaw2-redhat, ywang96, comaniac and alexm-redhat as code owners April 22, 2025 04:55

mergify bot added the v1 label Apr 22, 2025

WoosukKwon reviewed Apr 22, 2025

View reviewed changes

WoosukKwon approved these changes Apr 22, 2025

View reviewed changes

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 22, 2025

WoosukKwon enabled auto-merge (squash) April 22, 2025 06:34

WoosukKwon merged commit e4d6144 into vllm-project:main Apr 22, 2025
62 checks passed

njhill deleted the fix-inc-detok branch April 22, 2025 10:15

frieda-huang pushed a commit to frieda-huang/vllm that referenced this pull request Apr 23, 2025

[BugFix] Fix incremental detokenization perf issue (vllm-project#16963)

a72036a

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Frieda (Jingying) Huang <[email protected]>

jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025

[BugFix] Fix incremental detokenization perf issue (vllm-project#16963)

34500f0

Signed-off-by: Nick Hill <[email protected]>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[BugFix] Fix incremental detokenization perf issue (vllm-project#16963)

dbedb0b

Signed-off-by: Nick Hill <[email protected]>

adobrzyn pushed a commit to HabanaAI/vllm-fork that referenced this pull request Apr 30, 2025

[BugFix] Fix incremental detokenization perf issue (vllm-project#16963)

74dd803

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Agata Dobrzyniewicz <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[BugFix] Fix incremental detokenization perf issue (vllm-project#16963)

1cd0e20

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Mu Huai <[email protected]>

ckhordiasma mentioned this pull request May 14, 2025

nm vllm ent 0.8.5 sync red-hat-data-services/vllm#139

Merged

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025

[BugFix] Fix incremental detokenization perf issue (vllm-project#16963)

1a6d304

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: minpeter <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BugFix] Fix incremental detokenization perf issue #16963

[BugFix] Fix incremental detokenization perf issue #16963

Uh oh!

njhill commented Apr 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 22, 2025

Uh oh!

WoosukKwon Apr 22, 2025

Uh oh!

njhill Apr 22, 2025

Uh oh!

WoosukKwon Apr 22, 2025

Uh oh!

WoosukKwon left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[BugFix] Fix incremental detokenization perf issue #16963

[BugFix] Fix incremental detokenization perf issue #16963

Uh oh!

Conversation

njhill commented Apr 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 22, 2025

Uh oh!

WoosukKwon Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

njhill Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

njhill commented Apr 22, 2025 •

edited by github-actions bot

Loading