Closed
Description
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
- 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 5. Please use English, otherwise it will be closed.
Describe the bug
The inference result of a simple question such as "9.11 and 9.8 which is greater?" can often time (~4 out of 5 times) result in progressively meaningless texts as more tokens are being generated.
The model checkpoint is: https://huggingface.co/cognitivecomputations/DeepSeek-R1-AWQ
SGlang installation from pip install "sglang[all]>=0.4.3.post4" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python
Reproduction
python3 -m sglang.launch_server --model /model_ckpt/DeepSeek-R1-AWQ --trust-remote --tp 8 --dtype float16
Environment
sglang[all]>=0.4.3.post4
flashinfer.ai/whl/cu124/torch2.5/flashinfer-python