Description
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
- 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 5. Please use English, otherwise it will be closed.
Describe the bug
I used a node H20-3e-141G to deploy DeepSeek-v3-0324 with sglang docker image v0.4.5-cu124. I ran sglang.bench_serving and found input token throughput decrease dramatically comparing with docker image v0.4.3
-
Sglang v0.4.3
num_prompts 16
input token throughput 652TPS -
Sglang v0.4.5
num_prompts 16
input token throughput 198TPS
Reproduction
The startup command is completely the same:
docker run -itd --gpus all --shm-size 500g -p 8000:8000 -v /data/0324:/data/deepseek-v3 --ipc=host --network=host --privileged=true lmsysorg/sglang:v0.4.5-cu124 python3 -m sglang.launch_server --model /data/deepseek-v3 --served-model-name deepseek-v3 --mem-fraction-static 0.95 --tp 8 --host 0.0.0.0 --port 8000 --max-total-tokens 65536 --trust-remote-code --enable-flashinfer-mla --enable-dp-attention --dp 2
Benchmark command for v0.4.3 and v0.4.5:
python3 -m sglang.bench_serving --backend sglang --host 0.0.0.0 --port 8000 --model deepseek-v3 --dataset-name random --num-prompts 16 --random-input 4096 --random-output 1024 --random-range-ratio 0.5 --dataset-path /data/ShareGPT_V3_unfiltered_cleaned_split.json
Environment
GPU: H20-3e 141G *8