Skip to content

Commit 4195ebe

Browse files
mRSun15hebiao064
authored andcommitted
add attention backend supporting matrix in the doc (sgl-project#5211)
Co-authored-by: Stefan He <[email protected]>
1 parent 9da3d9c commit 4195ebe

File tree

2 files changed

+40
-0
lines changed

2 files changed

+40
-0
lines changed

docs/backend/attention_backend.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Attention Backend
2+
3+
## Supporting matrix for different attention backend
4+
5+
| **Backend** | **Page Size > 1** | **Spec Decoding** | **MLA** | **Sliding Window** | **MultiModal** |
6+
|--------------------------|-------------------|-------------------|--------|--------------------|------------|
7+
| **FlashInfer** ||||||
8+
| **FA3** ||||||
9+
| **Triton** ||||||
10+
| **Torch Native** ||||||
11+
12+
13+
## User guide
14+
15+
#### Launch command for different attention backend.
16+
17+
- FlashInfer (Default for Non-Hopper Machines, e.g., A100, A40)
18+
```bash
19+
python3 -m sglang.launch_server --model meta-llama/Meta-Llama-3.1-8B-Instruct --attention-backend flashinfer
20+
python3 -m sglang.launch_server --tp 8 --model deepseek-ai/DeepSeek-V3 --attention-backend flashinfer --trust-remote-code
21+
```
22+
23+
- FlashAttention 3 (Default for Hopper Machines, e.g., H100, H200, H20)
24+
```bash
25+
python3 -m sglang.launch_server --model meta-llama/Meta-Llama-3.1-8B-Instruct --attention-backend fa3
26+
python3 -m sglang.launch_server --tp 8 --model deepseek-ai/DeepSeek-V3 --trust-remote-code --attention-backend fa3
27+
```
28+
29+
- Triton
30+
```bash
31+
python3 -m sglang.launch_server --model meta-llama/Meta-Llama-3.1-8B-Instruct --attention-backend triton
32+
python3 -m sglang.launch_server --tp 8 --model deepseek-ai/DeepSeek-V3 --attention-backend triton --trust-remote-code
33+
34+
```
35+
36+
- Torch Native
37+
```bash
38+
python3 -m sglang.launch_server --model meta-llama/Meta-Llama-3.1-8B-Instruct --attention-backend torch_native
39+
```

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ The core features include:
3232
backend/sampling_params.md
3333
backend/hyperparameter_tuning.md
3434
backend/structured_outputs_for_reasoning_models.ipynb
35+
backend/attention_backend.md
3536

3637
.. toctree::
3738
:maxdepth: 1

0 commit comments

Comments
 (0)