Skip to content

Commit 3b02be8

Browse files
adarshxstarinkk
authored andcommitted
[Docs] Update docs for Qwen3 and Qwen3MoE (sgl-project#5836)
1 parent 51bc0ec commit 3b02be8

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/supported_models/generative_models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ python3 -m sglang.launch_server \
1717
| Model Family (Variants) | Example HuggingFace Identifier | Description |
1818
|-------------------------------------|--------------------------------------------------|----------------------------------------------------------------------------------------|
1919
| **DeepSeek** (v1, v2, v3/R1) | `deepseek-ai/DeepSeek-R1` | Series of advanced reasoning-optimized models (including a 671B MoE) trained with reinforcement learning; top performance on complex reasoning, math, and code tasks. [SGLang provides Deepseek v3/R1 model-specific optimizations](https://docs.sglang.ai/references/deepseek)|
20-
| **Qwen** (2, 2.5 series, MoE) | `Qwen/Qwen2.5-14B-Instruct` | Alibaba’s Qwen model family (7B to 72B) with SOTA performance; Qwen2.5 series improves multilingual capability and includes base, instruct, MoE, and code-tuned variants. |
20+
| **Qwen** (3, 3MoE, 2.5, 2 series) | `Qwen/Qwen3-4B-Base`, `Qwen/Qwen3-MoE-15B-A2B` | Alibaba’s latest Qwen3 series for complex reasoning, language understanding, and generation tasks; Support for MoE variants along with previous generation 2.5, 2, etc. |
2121
| **Llama** (2, 3.x, 4 series) | `meta-llama/Llama-4-Scout-17B-16E-Instruct` | Meta’s open LLM series, spanning 7B to 400B parameters (Llama 2, 3, and new Llama 4) with well-recognized performance. [SGLang provides Llama-4 model-specific optimizations](https://docs.sglang.ai/references/llama4) |
2222
| **Mistral** (Mixtral, NeMo, Small3) | `mistralai/Mistral-7B-Instruct-v0.2` | Open 7B LLM by Mistral AI with strong performance; extended into MoE (“Mixtral”) and NeMo Megatron variants for larger scale. |
2323
| **Gemma** (v1, v2, v3) | `google/gemma-3-1b-it` | Google’s family of efficient multilingual models (1B–27B); Gemma 3 offers a 128K context window, and its larger (4B+) variants support vision input. |

0 commit comments

Comments
 (0)