Skip to content

[Bug] MMLU accuracy with DeepSeek NEXTN #5743

Closed
@TianQiLin666666

Description

@TianQiLin666666

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

With sglang 0.4.5.post3,when I turned on MTP for DeepSeek inference,the MMLU accuracy decreased sharply.

MTP MMLU accuracy
on 0.305
off 0.861

Reproduction

Launch server with MTP:

# node0
python3 -m sglang.launch_server \
--attention-backend fa3 \
--speculative-algo NEXTN --speculative-draft /data/models/DeepSeek-R1-NextN --speculative-num-steps 4 --speculative-eagle-topk 2 --speculative-num-draft-tokens 6 \
--model-path /data/models/DeepSeek-R1/ \
--tp 16 \
--dist-init-addr 192.168.0.1:10240 \
--nnodes 2 --node-rank 0 --trust-remote-code \
--host 0.0.0.0 --port 8000 --mem-fraction-static 0.75 --disable-chunked-prefix-cache

Launch server without MTP:

# node0
python3 -m sglang.launch_server \
--attention-backend fa3 \
--model-path /data/models/DeepSeek-R1/ \
--tp 16 \
--dist-init-addr 192.168.0.1:10240 \
--nnodes 2 --node-rank 0 --trust-remote-code \
--host 0.0.0.0 --port 8000 --mem-fraction-static 0.75 --disable-chunked-prefix-cache

MMLU test:

python3 bench_sglang.py --nsub 10 --port 8000 --data_dir /data/datasets/mmlu

Environment

Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H20
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 535.161.08
PyTorch: 2.6.0+cu124
sglang: 0.4.5.post3
sgl_kernel: 0.0.9.post2
flashinfer: 0.1.6+cu124torch2.4
triton: 3.2.0
transformers: 4.51.1
torchao: 0.8.0
numpy: 1.26.4
aiohttp: 3.9.3
fastapi: 0.115.8
hf_transfer: 0.1.9
huggingface_hub: 0.30.2
interegular: 0.3.3
modelscope: 1.22.3
orjson: 3.10.15
outlines: 0.0.46
packaging: 23.2
psutil: 5.9.4
pydantic: 2.10.6
multipart: Module Not Found
zmq: Module Not Found
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.6.4.post1
xgrammar: 0.1.17
openai: 1.60.2
tiktoken: 0.7.0
anthropic: 0.45.2
litellm: 1.59.10
decord: 0.6.0
NVIDIA Topology:
	GPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	NIC0	NIC1	NIC2	NIC3	NIC4	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	NV18	NV18	NV18	NV18	NV18	NV18	NV18	SYS	PIX	NODE	SYS	SYS	0-89	0		N/A
GPU1	NV18	 X 	NV18	NV18	NV18	NV18	NV18	NV18	SYS	PIX	NODE	SYS	SYS	0-89	0		N/A
GPU2	NV18	NV18	 X 	NV18	NV18	NV18	NV18	NV18	SYS	NODE	PIX	SYS	SYS	0-89	0		N/A
GPU3	NV18	NV18	NV18	 X 	NV18	NV18	NV18	NV18	SYS	NODE	PIX	SYS	SYS	0-89	0		N/A
GPU4	NV18	NV18	NV18	NV18	 X 	NV18	NV18	NV18	SYS	SYS	SYS	PIX	NODE	90-179	1		N/A
GPU5	NV18	NV18	NV18	NV18	NV18	 X 	NV18	NV18	SYS	SYS	SYS	PIX	NODE	90-179	1		N/A
GPU6	NV18	NV18	NV18	NV18	NV18	NV18	 X 	NV18	SYS	SYS	SYS	NODE	PIX	90-179	1		N/A
GPU7	NV18	NV18	NV18	NV18	NV18	NV18	NV18	 X 	SYS	SYS	SYS	NODE	PIX	90-179	1		N/A
NIC0	SYS	SYS	SYS	SYS	SYS	SYS	SYS	SYS	 X 	SYS	SYS	SYS	SYS
NIC1	PIX	PIX	NODE	NODE	SYS	SYS	SYS	SYS	SYS	 X 	NODE	SYS	SYS
NIC2	NODE	NODE	PIX	PIX	SYS	SYS	SYS	SYS	SYS	NODE	 X 	SYS	SYS
NIC3	SYS	SYS	SYS	SYS	PIX	PIX	NODE	NODE	SYS	SYS	SYS	 X 	NODE
NIC4	SYS	SYS	SYS	SYS	NODE	NODE	PIX	PIX	SYS	SYS	SYS	NODE	 X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1
  NIC2: mlx5_2
  NIC3: mlx5_3
  NIC4: mlx5_4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions