-
-
Notifications
You must be signed in to change notification settings - Fork 8.5k
[Model][VLM] Add Qwen2.5-Omni model support (thinker only) #15130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Sorry I don't have time to review in detail tonight, but from a quick glance, can you add this model to the following pages?
|
OK,I will add them tomorrow. |
@fyabc Qwen/Qwen2.5-Omni-7B ?? |
Sorry for the delay - going to take a look at this PR tonight! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution! I have left some comments!
Hi @ywang96 @DarkLight1337 , I update some other examples here, please check the code. |
Can you resolve the failures in the basic models test? |
Signed-off-by: fyabc <[email protected]>
Hi @DarkLight1337, I have fixed the test registry, now the api timeout error seems raised outside of thie PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very sorry for the long delay - let's get this in!
…ect#15130) Signed-off-by: fyabc <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Xiong Wang <[email protected]> Signed-off-by: Yang Wang <[email protected]>
…ect#15130) Signed-off-by: fyabc <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Xiong Wang <[email protected]>
…ect#15130) Signed-off-by: fyabc <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Xiong Wang <[email protected]>
…ect#15130) Signed-off-by: fyabc <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Xiong Wang <[email protected]> Signed-off-by: Agata Dobrzyniewicz <[email protected]>
…ect#15130) Signed-off-by: fyabc <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Xiong Wang <[email protected]> Signed-off-by: Mu Huai <[email protected]>
Official PR: vllm-project#15130 example: python examples/offline_inference/audio_language.py --model-type qwen2_5_omni python examples/offline_inference/vision_language.py --modality image --model-type qwen2_5_omni python examples/offline_inference/vision_language.py --modality video --model-type qwen2_5_omni Signed-off-by: Chen, Wenbin <[email protected]>
This PR adding support for Qwen2.5-Omni model (thinker only).
Requirements
This PR requires this corresponding transformers PR.
Note: You need to install transformers from source from that branch
Example Usage
Notes
The whole Qwen2.5-Omni model includes three parts:
thinker
: multimodal inputs -> text responses & hidden statestalker
: text responses & hidden states from thinker -> speech codescode2wav
(streaming codec decoder): codes -> speechThis PR only implements the
thinker
part now, it accepts multimodal inputs (images / videos / audios), and generate text responses, similar to other common VLMs.We have also develped an end-to-end implementation (will be released soon), but due to its significant impact on the vLLM framework architecture, we will not create the related pull request for now.
FIX #15563