-
Notifications
You must be signed in to change notification settings - Fork 2.3k
model: support gemma-3-it #4424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8363561
to
669e44b
Compare
for image_index, (image, estimated_frames) in enumerate( | ||
zip(image_data, estimated_frames_list) | ||
): | ||
if len(all_frames) >= MAX_NUM_FRAMES: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think base image prcessor don't need change?
f266005
to
70ae46c
Compare
This comment was marked as resolved.
This comment was marked as resolved.
062911b
to
7936216
Compare
@mickqian @yizhang2077 is this ready to merge? approved? |
@Swipe4057 we have a mmmu benchmark, and for each vlm model, we need verify this benchmark and compare with transformers implementation. Besides, if transformers supports this model, we suggest add an unittest to compare logits with hf. |
@mickqian plz paste mmmu benchmark result here. |
I add an issue for keeping track of current VLM models performance in mmmu benchmark. We can update benchmark result here #4456 @mickqian @zhaochenyang20 |
text_parts = input_text.split(image_token) | ||
import re | ||
|
||
pattern = "(" + "|".join(re.escape(sep) for sep in [image_token]) + ")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need add regex here?
e6ac032
to
7b94f77
Compare
Co-authored-by: Yuhao Chen <[email protected]>
@zhaochenyang20 this PR can be merged |
@zhaochenyang20 This is ready. Many thanks |
@mickqian @yizhang2077 thanks. I will tell lianmin! |
@Ying1123 hey, ying, this can be merged. it's high-prioritized. |
@zhaochenyang20 @mickqian Hello, I've been running Gemma3 27b it model in vllm 0.8.0 and sglang installed from source with original weights on an H100 GPU. My results show that for the same long text-only query, the outputs of the models differ significantly. In vllm, generation proceeds normally, but in sglang, during long generation, it starts to degrade into garbage output and continues indefinitely. This phenomenon occurs with long queries and queries containing code. Could someone else test this as well? Also, a question: is prefix caching supported in sglang for multimodal models, particularly Gemma 3? |
supported. could you give us your reproducable scripts. We will fix this ASAP> |
@zhaochenyang20 You'll see that the generation doesn't stop and something like this will begin: |
cc @mickqian mick |
a fix is on the way |
Gemma3's generation speed is surprisingly slow compared to other 3B/4B models like Qwen2.5-3B. Is the current Gemma3 implementation correct? |
great! |
Motivation
Support gemma3-it.
FYI,
gemma3-1b-it
is an llm,gemma3-pt
series are not chat models.Modifications
Checklist