model: support gemma-3-it #4424

mickqian · 2025-03-14T09:33:32Z

Motivation

Support gemma3-it.

FYI, gemma3-1b-it is an llm, gemma3-pt series are not chat models.

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

sgl-kernel/csrc/moe/moe_align_kernel.cu

python/sglang/srt/entrypoints/http_server.py

yizhang2077 · 2025-03-14T09:49:33Z

python/sglang/srt/managers/image_processors/base_image_processor.py

-        for image_index, (image, estimated_frames) in enumerate(
-            zip(image_data, estimated_frames_list)
-        ):
-            if len(all_frames) >= MAX_NUM_FRAMES:


I think base image prcessor don't need change?

python/sglang/srt/managers/scheduler.py

python/sglang/srt/models/gemma2.py

python/sglang/srt/utils.py

python/sglang/srt/configs/qwen2_5_vl_config.py

zhaochenyang20 · 2025-03-14T17:59:36Z

@mickqian @yizhang2077 is this ready to merge? approved?

python/sglang/srt/model_executor/forward_batch_info.py

python/sglang/srt/models/gemma3_causal.py

yizhang2077 · 2025-03-15T16:48:49Z

Typically in PRs when adding models, accuracy is reported, for example, on MMLU implemented in the PR compared to the implementation in transformers, and for multimodal models, some multimodal benchmark. Could you advise how you verified the implementation?

@Swipe4057 we have a mmmu benchmark, and for each vlm model, we need verify this benchmark and compare with transformers implementation. Besides, if transformers supports this model, we suggest add an unittest to compare logits with hf.

yizhang2077 · 2025-03-15T16:49:58Z

@mickqian plz paste mmmu benchmark result here.

yizhang2077 · 2025-03-15T17:14:34Z

I add an issue for keeping track of current VLM models performance in mmmu benchmark. We can update benchmark result here #4456 @mickqian @zhaochenyang20

yizhang2077 · 2025-03-16T02:28:02Z

python/sglang/srt/managers/image_processors/base_image_processor.py

-            text_parts = input_text.split(image_token)
+            import re
+
+            pattern = "(" + "|".join(re.escape(sep) for sep in [image_token]) + ")"


Why we need add regex here?

Co-authored-by: Yuhao Chen <[email protected]>

yizhang2077 · 2025-03-16T09:28:38Z

@zhaochenyang20 this PR can be merged

mickqian · 2025-03-16T13:05:16Z

@zhaochenyang20 This is ready. Many thanks

zhaochenyang20 · 2025-03-16T14:35:25Z

@mickqian @yizhang2077 thanks. I will tell lianmin!

zhaochenyang20 · 2025-03-16T14:40:41Z

@Ying1123 hey, ying, this can be merged. it's high-prioritized.

Swipe4057 · 2025-03-19T07:17:31Z

@zhaochenyang20 @mickqian Hello, I've been running Gemma3 27b it model in vllm 0.8.0 and sglang installed from source with original weights on an H100 GPU. My results show that for the same long text-only query, the outputs of the models differ significantly. In vllm, generation proceeds normally, but in sglang, during long generation, it starts to degrade into garbage output and continues indefinitely. This phenomenon occurs with long queries and queries containing code. Could someone else test this as well?

Also, a question: is prefix caching supported in sglang for multimodal models, particularly Gemma 3?

zhaochenyang20 · 2025-03-19T15:47:30Z

@zhaochenyang20 @mickqian Hello, I've been running Gemma3 27b it model in vllm 0.8.0 and sglang installed from source with original weights on an H100 GPU. My results show that for the same long text-only query, the outputs of the models differ significantly. In vllm, generation proceeds normally, but in sglang, during long generation, it starts to degrade into garbage output and continues indefinitely. This phenomenon occurs with long queries and queries containing code. Could someone else test this as well?

Also, a question: is prefix caching supported in sglang for multimodal models, particularly Gemma 3?

supported. could you give us your reproducable scripts. We will fix this ASAP>

Swipe4057 · 2025-03-19T16:28:33Z

@zhaochenyang20
Here's my launch command:

Then ask a simple question:
How would you advise fixing high vulnerabilities in a Docker container that are found in the base Debian image?
Temperature 1 and top_p 0.95

You'll see that the generation doesn't stop and something like this will begin:

In the logs, the generation will continue:

zhaochenyang20 · 2025-03-20T05:48:06Z

@zhaochenyang20 Here's my launch command: Then ask a simple question: How would you advise fixing high vulnerabilities in a Docker container that are found in the base Debian image? Temperature 1 and top_p 0.95

You'll see that the generation doesn't stop and something like this will begin:

In the logs, the generation will continue:

cc @mickqian mick

mickqian · 2025-03-20T06:42:27Z

a fix is on the way

rangehow · 2025-04-02T10:58:39Z

Gemma3's generation speed is surprisingly slow compared to other 3B/4B models like Qwen2.5-3B. Is the current Gemma3 implementation correct?

zhaochenyang20 · 2025-04-02T15:43:06Z

great!

mickqian requested review from merrymercy, Ying1123, zhyncs, hnyls2002, ispobock, ByronHsu, HaiShaw and zhaochenyang20 as code owners March 14, 2025 09:33

mickqian force-pushed the gemma branch 2 times, most recently from 8363561 to 669e44b Compare March 14, 2025 09:44

mickqian requested review from HandH1998, BBuf and yizhang2077 as code owners March 14, 2025 09:44

mickqian force-pushed the gemma branch from 669e44b to 194d5dd Compare March 14, 2025 09:47

yizhang2077 reviewed Mar 14, 2025

View reviewed changes

python/sglang/srt/configs/qwen2_5_vl_config.py Outdated Show resolved Hide resolved

mickqian force-pushed the gemma branch 4 times, most recently from f266005 to 70ae46c Compare March 14, 2025 13:08

This comment was marked as resolved.

Sign in to view

mickqian force-pushed the gemma branch 3 times, most recently from 062911b to 7936216 Compare March 14, 2025 15:40

mickqian mentioned this pull request Mar 14, 2025

model: support shieldgemma2: image classifier #4428

Closed

6 tasks

zhyncs added the high priority label Mar 15, 2025

zhyncs assigned yizhang2077 Mar 15, 2025

mickqian force-pushed the gemma branch from f5e296f to d319fcb Compare March 15, 2025 03:12

yizhang2077 reviewed Mar 15, 2025

View reviewed changes

python/sglang/srt/model_executor/forward_batch_info.py Outdated Show resolved Hide resolved

python/sglang/srt/models/gemma3_causal.py Outdated Show resolved Hide resolved

mickqian force-pushed the gemma branch from f690109 to 9131097 Compare March 16, 2025 02:18

yizhang2077 reviewed Mar 16, 2025

View reviewed changes

yizhang2077 approved these changes Mar 16, 2025

View reviewed changes

mickqian force-pushed the gemma branch 3 times, most recently from e6ac032 to 7b94f77 Compare March 16, 2025 03:53

model: support gemma-3-it

5cfd903

Co-authored-by: Yuhao Chen <[email protected]>

mickqian force-pushed the gemma branch from d0f2944 to 5cfd903 Compare March 16, 2025 06:19

zhaochenyang20 merged commit 9d02bb3 into sgl-project:main Mar 17, 2025
21 checks passed

AkazaAkane mentioned this pull request Mar 17, 2025

[Feature] New models Gemma 3 #4332

Closed

2 tasks

This was referenced Mar 27, 2025

refactor: multimodal data #4754

Merged

[Bug] The Gemma 3 model generates garbage on long generations #4807

Closed

This was referenced Mar 27, 2025

gemma3: impl get_attention_sliding_window_size for attn init #4823

Merged

deps: lazy import optional dependencies gguf and torchvision #4826

Merged

Swipe4057 mentioned this pull request Jun 23, 2025

[Bug] Gemma3 throughput is 2x lower than vLLM #7471

Open

5 tasks

model: support gemma-3-it #4424

model: support gemma-3-it #4424

Conversation

mickqian commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

Uh oh!

Uh oh!

yizhang2077 Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

zhaochenyang20 commented Mar 14, 2025

Uh oh!

Uh oh!

Uh oh!

yizhang2077 commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yizhang2077 commented Mar 15, 2025

Uh oh!

yizhang2077 commented Mar 15, 2025

Uh oh!

yizhang2077 Mar 16, 2025

Choose a reason for hiding this comment

Uh oh!

yizhang2077 commented Mar 16, 2025

Uh oh!

mickqian commented Mar 16, 2025

Uh oh!

zhaochenyang20 commented Mar 16, 2025

Uh oh!

zhaochenyang20 commented Mar 16, 2025

Uh oh!

Uh oh!

Swipe4057 commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhaochenyang20 commented Mar 19, 2025

Uh oh!

Swipe4057 commented Mar 19, 2025

Uh oh!

zhaochenyang20 commented Mar 20, 2025

Uh oh!

mickqian commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rangehow commented Apr 2, 2025

Uh oh!

zhaochenyang20 commented Apr 2, 2025

Uh oh!

Uh oh!

mickqian commented Mar 14, 2025 •

edited

Loading

yizhang2077 commented Mar 15, 2025 •

edited

Loading

Swipe4057 commented Mar 19, 2025 •

edited

Loading

mickqian commented Mar 20, 2025 •

edited

Loading