[Bug] Accuracy issue with SGLang using DeepSeek-R1-AWQ

### Checklist

- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 5. Please use English, otherwise it will be closed.

### Describe the bug

The inference result of a simple question such as "9.11 and 9.8 which is greater?" can often time (~4 out of 5 times) result in progressively meaningless texts as more tokens are being generated.

The model checkpoint is: https://huggingface.co/cognitivecomputations/DeepSeek-R1-AWQ

SGlang installation from pip install "sglang[all]>=0.4.3.post4" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python




### Reproduction

python3 -m sglang.launch_server --model /model_ckpt/DeepSeek-R1-AWQ --trust-remote --tp 8 --dtype float16

### Environment

sglang[all]>=0.4.3.post4
flashinfer.ai/whl/cu124/torch2.5/flashinfer-python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Accuracy issue with SGLang using DeepSeek-R1-AWQ #4158

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Accuracy issue with SGLang using DeepSeek-R1-AWQ #4158

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions