Description
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
- 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 5. Please use English, otherwise it will be closed.
Describe the bug
accuracy loss when both --speculative-algorithm NEXTN and Shared experts fusion optimization are enabled.
Reproduction
sglang 0.4.5.post3
server start cmd:
python3 -m sglang.launch_server --model-path $deepseek_R1_MODEL_PATH --tp 8 - --disable-radix-cache --mem-fraction-static 0.85 --attention-backend flashinfer --enable-ep-moe --ep-size=8
server will enable Shared experts fusion optimization automatically, and this works fine with this question:
curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{ "model": "ds-test-model", "prompt": "Beijing is", "max_tokens": 30, "temperature": 0, "stream": true }'
response is :
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" the","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" capital","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" of","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" China","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":".","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" It","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" is","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" the","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" political","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" and","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" cultural","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" center","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" of","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" the","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" country","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":".","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" There","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" are","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" many","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" places","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" of","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" interest","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":",","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" such","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" as","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" the","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" Great","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" Wall","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":",","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"b337860cedc2442dabe8f25b9e84f95f","object":"text_completion","created":1745477928,"model":"ds-test-model","choices":[{"index":0,"text":" the","logprobs":null,"finish_reason":"length","matched_stop":null}],"usage":null}
data: [DONE]
start with nextn :
python3 -m sglang.launch_server --model-path $R1_MODEL_PATH --tp $TP --trust-remote-code --port $PORT --host 0.0.0.0 --disable-radix-cache --mem-fraction-static 0.85 --max-running-requests $max_running_requests --attention-backend flashinfer --enable-ep-moe --ep-size=8 --speculative-algorithm NEXTN --speculative-draft $DeepSeek-R1-NextN_MODEL_PATH --speculative-num-steps 2 --speculative-eagle-topk 1 --speculative-num-draft-tokens 2
the answer looks like gibberish:
data: {"id":"60f2943ece404ca99b3cf9af4f8698e7","object":"text_completion","created":1745478281,"model":"ds-test-model","choices":[{"index":0,"text":" ________","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"60f2943ece404ca99b3cf9af4f8698e7","object":"text_completion","created":1745478281,"model":"ds-test-model","choices":[{"index":0,"text":" capital","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"60f2943ece404ca99b3cf9af4f8698e7","object":"text_completion","created":1745478281,"model":"ds-test-model","choices":[{"index":0,"text":" of China","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"60f2943ece404ca99b3cf9af4f8698e7","object":"text_completion","created":1745478281,"model":"ds-test-model","choices":[{"index":0,"text":".","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"60f2943ece404ca99b3cf9af4f8698e7","object":"text_completion","created":1745478281,"model":"ds-test-model","choices":[{"index":0,"text":" And it's","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"60f2943ece404ca99b3cf9af4f8698e7","object":"text_completion","created":1745478281,"model":"ds-test-model","choices":[{"index":0,"text":" ________ city","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"60f2943ece404ca99b3cf9af4f8698e7","object":"text_completion","created":1745478281,"model":"ds-test-model","choices":[{"index":0,"text":" with many","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"60f2943ece404ca99b3cf9af4f8698e7","object":"text_completion","created":1745478281,"model":"ds-test-model","choices":[{"index":0,"text":" places of interest","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"60f2943ece404ca99b3cf9af4f8698e7","object":"text_completion","created":1745478281,"model":"ds-test-model","choices":[{"index":0,"text":".\n\nA.","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"60f2943ece404ca99b3cf9af4f8698e7","object":"text_completion","created":1745478281,"model":"ds-test-model","choices":[{"index":0,"text":" a; a","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"60f2943ece404ca99b3cf9af4f8698e7","object":"text_completion","created":1745478281,"model":"ds-test-model","choices":[{"index":0,"text":" B. a","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"60f2943ece404ca99b3cf9af4f8698e7","object":"text_completion","created":1745478281,"model":"ds-test-model","choices":[{"index":0,"text":"; the C","logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"60f2943ece404ca99b3cf9af4f8698e7","object":"text_completion","created":1745478281,"model":"ds-test-model","choices":[{"index":0,"text":". the;","logprobs":null,"finish_reason":"length","matched_stop":null}],"usage":null}
data: [DONE]
Looking forward to your reply and help. Thank you!
Environment
Python: 3.10.12 (main, Feb 4 2025, 14:57:36) [GCC 11.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H20-3e
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 570.124.06
PyTorch: 2.6.0+cu124
sglang: 0.4.5.post3
sgl_kernel: 0.0.9.post2
flashinfer: Module Not Found
triton: 3.2.0
transformers: 4.51.1
torchao: 0.9.0
numpy: 1.26.4
aiohttp: 3.11.13
fastapi: 0.115.11
hf_transfer: 0.1.9
huggingface_hub: 0.30.2
interegular: 0.3.3
modelscope: 1.23.2
orjson: 3.10.15
outlines: 0.0.46
packaging: 24.2
psutil: 7.0.0
pydantic: 2.10.6
multipart: Module Not Found
zmq: Module Not Found
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.6.4.post1
xgrammar: 0.1.17
openai: 1.65.4
tiktoken: 0.9.0
anthropic: 0.49.0
litellm: 1.62.4
decord: 0.6.0