[BugFix][Frontend] Fix `LLM.chat()` tokenization #16081

njhill · 2025-04-05T01:21:14Z

Chat templates generally include special tokens and so add_special_tokens=False should be used when tokenizing the template outputs. This is already the default behaviour of the openai chat completions endpoint.

In particular this avoids the llama double-BOS token issue.

Also, LoRA adapter-specific tokenizers weren't being used by LLM.chat().

Fixes #16028
Fixes #16853

Also just found related issues:

Chat templates generally include special tokens and so add_special_tokens=False should be used when tokenizing the template outputs. In particular this avoids the llama double-BOS token issue. Also, LoRA adapter-specific tokenizers weren't being used by LLM.chat(). Signed-off-by: Nick Hill <[email protected]>

github-actions · 2025-04-05T01:21:23Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

njhill · 2025-04-05T01:34:19Z

@DarkLight1337 I guess this as-is bypasses one of the length checks, but is at least more consistent with the chat completions api behaviour.

DarkLight1337 · 2025-04-05T06:11:48Z

As with the other PRs, can you add tests to avoid regressions in the future?

mgoin

Could we get this landed since several users have run into this issue?

mergify · 2025-04-25T15:50:07Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

…chat-bos # Conflicts: # vllm/entrypoints/llm.py

Signed-off-by: Nick Hill <[email protected]>

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Agata Dobrzyniewicz <[email protected]>

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: minpeter <[email protected]>

njhill added bug Something isn't working frontend needs-tests Tests needed for this PR labels Apr 5, 2025

njhill requested a review from DarkLight1337 April 5, 2025 01:27

njhill mentioned this pull request Apr 5, 2025

[Bug]: Multiple inconsistencies wrt BOS injection and BOS duplication #9519

Open

This was referenced Apr 18, 2025

[Bug]: Two BOS when using chat #16853

Closed

Fix doubling of BOS token when BOS already exists (e.g. via chat template) #16976

Closed

mgoin approved these changes Apr 22, 2025

View reviewed changes

DarkLight1337 mentioned this pull request Apr 23, 2025

[Bug]: PaliGemma2 not working with OpenAI Docker serve #12052

Open

1 task

njhill added this to the v0.8.5 milestone Apr 25, 2025

mergify bot added the needs-rebase label Apr 25, 2025

njhill added 2 commits April 25, 2025 09:08

Merge remote-tracking branch 'refs/remotes/origin/main' into fix-llm-…

fcd3f0c

…chat-bos # Conflicts: # vllm/entrypoints/llm.py

Add test

343dccb

Signed-off-by: Nick Hill <[email protected]>

njhill requested review from robertgshaw2-redhat and simon-mo as code owners April 25, 2025 16:58

njhill added ready ONLY add when PR is ready to merge/full CI is needed and removed needs-tests Tests needed for this PR labels Apr 25, 2025

mergify bot removed the needs-rebase label Apr 25, 2025

njhill mentioned this pull request Apr 25, 2025

[Bugfix] llm.chat bos token duplicate #15695

Closed

DarkLight1337 approved these changes Apr 25, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) April 25, 2025 22:09

DarkLight1337 merged commit 7011645 into vllm-project:main Apr 25, 2025
60 checks passed

njhill deleted the fix-llm-chat-bos branch April 25, 2025 22:36

jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025

[BugFix][Frontend] Fix LLM.chat() tokenization (vllm-project#16081)

c99e037

Signed-off-by: Nick Hill <[email protected]>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[BugFix][Frontend] Fix LLM.chat() tokenization (vllm-project#16081)

af5fd2b

Signed-off-by: Nick Hill <[email protected]>

adobrzyn pushed a commit to HabanaAI/vllm-fork that referenced this pull request Apr 30, 2025

[BugFix][Frontend] Fix LLM.chat() tokenization (vllm-project#16081)

e822b0d

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Agata Dobrzyniewicz <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[BugFix][Frontend] Fix LLM.chat() tokenization (vllm-project#16081)

fd9d09f

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Mu Huai <[email protected]>

ckhordiasma mentioned this pull request May 14, 2025

nm vllm ent 0.8.5 sync red-hat-data-services/vllm#139

Merged

zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025

[BugFix][Frontend] Fix LLM.chat() tokenization (vllm-project#16081)

525eae2

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025

[BugFix][Frontend] Fix LLM.chat() tokenization (vllm-project#16081)

fc53132

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: minpeter <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BugFix][Frontend] Fix `LLM.chat()` tokenization #16081

[BugFix][Frontend] Fix `LLM.chat()` tokenization #16081

Uh oh!

njhill commented Apr 5, 2025 •

edited by DarkLight1337

Loading

Uh oh!

github-actions bot commented Apr 5, 2025

Uh oh!

njhill commented Apr 5, 2025

Uh oh!

DarkLight1337 commented Apr 5, 2025

Uh oh!

mgoin left a comment

Uh oh!

mergify bot commented Apr 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[BugFix][Frontend] Fix LLM.chat() tokenization #16081

[BugFix][Frontend] Fix LLM.chat() tokenization #16081

Uh oh!

Conversation

njhill commented Apr 5, 2025 • edited by DarkLight1337 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 5, 2025

Uh oh!

njhill commented Apr 5, 2025

Uh oh!

DarkLight1337 commented Apr 5, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Apr 25, 2025

Uh oh!

Uh oh!

Uh oh!

[BugFix][Frontend] Fix `LLM.chat()` tokenization #16081

[BugFix][Frontend] Fix `LLM.chat()` tokenization #16081

njhill commented Apr 5, 2025 •

edited by DarkLight1337

Loading