Description
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning.
Inputs and outputs
Input:
Text string, such as a question, a prompt, or a document to be summarized
Images, normalized to 896 x 896 resolution and encoded to 256 tokens each
Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size
Output:
Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document
Total output context of 8192 tokens
Related resources
https://huggingface.co/collections/google/gemma-3-release-67c6c6f89c4f76621268bb6d
https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf