Skip to content

[BugFix][Frontend] Fix LLM.chat() tokenization #16081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 25, 2025

Conversation

njhill
Copy link
Member

@njhill njhill commented Apr 5, 2025

Chat templates generally include special tokens and so add_special_tokens=False should be used when tokenizing the template outputs. This is already the default behaviour of the openai chat completions endpoint.

In particular this avoids the llama double-BOS token issue.

Also, LoRA adapter-specific tokenizers weren't being used by LLM.chat().

Fixes #16028
Fixes #16853

Also just found related issues:

Chat templates generally include special tokens and so add_special_tokens=False should be used when tokenizing the template outputs.

In particular this avoids the llama double-BOS token issue.

Also, LoRA adapter-specific tokenizers weren't being used by LLM.chat().

Signed-off-by: Nick Hill <[email protected]>
@njhill njhill added bug Something isn't working frontend needs-tests Tests needed for this PR labels Apr 5, 2025
Copy link

github-actions bot commented Apr 5, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@njhill
Copy link
Member Author

njhill commented Apr 5, 2025

@DarkLight1337 I guess this as-is bypasses one of the length checks, but is at least more consistent with the chat completions api behaviour.

@DarkLight1337
Copy link
Member

As with the other PRs, can you add tests to avoid regressions in the future?

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get this landed since several users have run into this issue?

Copy link

mergify bot commented Apr 25, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Apr 25, 2025
njhill added 2 commits April 25, 2025 09:08
…chat-bos

# Conflicts:
#	vllm/entrypoints/llm.py
Signed-off-by: Nick Hill <[email protected]>
@njhill njhill added ready ONLY add when PR is ready to merge/full CI is needed and removed needs-tests Tests needed for this PR labels Apr 25, 2025
@mergify mergify bot removed the needs-rebase label Apr 25, 2025
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) April 25, 2025 22:09
@DarkLight1337 DarkLight1337 merged commit 7011645 into vllm-project:main Apr 25, 2025
60 checks passed
@njhill njhill deleted the fix-llm-chat-bos branch April 25, 2025 22:36
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025
adobrzyn pushed a commit to HabanaAI/vllm-fork that referenced this pull request Apr 30, 2025
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025
minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Two BOS when using chat [Bug]: Two beginning of sequence tokens for Lllama-3.2-3B-Instruct
3 participants