Skip to content

[Bug] Accuracy issue with SGLang using DeepSeek-R1-AWQ #4158

Closed
@TheTinyTeddy

Description

@TheTinyTeddy

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

The inference result of a simple question such as "9.11 and 9.8 which is greater?" can often time (~4 out of 5 times) result in progressively meaningless texts as more tokens are being generated.

The model checkpoint is: https://huggingface.co/cognitivecomputations/DeepSeek-R1-AWQ

SGlang installation from pip install "sglang[all]>=0.4.3.post4" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python

Reproduction

python3 -m sglang.launch_server --model /model_ckpt/DeepSeek-R1-AWQ --trust-remote --tp 8 --dtype float16

Environment

sglang[all]>=0.4.3.post4
flashinfer.ai/whl/cu124/torch2.5/flashinfer-python

Metadata

Metadata

Assignees

Labels

quantLLM Quantization

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions