-
-
Notifications
You must be signed in to change notification settings - Fork 8.5k
[V1] Enable multi-input by default #15799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: DarkLight1337 <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
3e4e0ac
to
c910774
Compare
Signed-off-by: DarkLight1337 <[email protected]>
c910774
to
90c6830
Compare
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
a9c6571
to
2de5e97
Compare
Signed-off-by: DarkLight1337 <[email protected]>
2de5e97
to
8d1f28b
Compare
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall look reasonable to me!
Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Mu Huai <[email protected]>
This PR enables multiple multi-modal input items for V1 without having to set
limit_mm_per_prompt
.Note: This may increase the default memory usage for multi-modal models because
max_num_mm_items_decoder_budget
no longer limitsmax_num_mm_items
inGPUModelRunner.profile_run
. You can explicitly set the limit to one vialimit_mm_per_prompt
or even disable unused modalities completely by setting the limit of that modality to zero. I have added a section to the Offline Inference docs accordingly.There is no need to set limits for V1 since encoder and decoder are profiled separately which should avoid OOM during inference time. The only hard limit is the context length which is checked in
Processor._validate_model_inputs
already.Note: Users can still
limit_mm_per_prompt
to exclude individual modalities from being profiled and used in inference.This is loosely a follow-up to #15703 which removed the direct dependency of various models on multimodal limits.
Some other changes: