[Track] DeepSeek V3/R1 nextn progress

## Triton Backend

@ispobock @pankajroark 

- [x] [refactor triton backend 1](https://github.com/sgl-project/sglang/pull/3292), [2](https://github.com/sgl-project/sglang/pull/3309)

- [x] [support custom mask](https://github.com/sgl-project/sglang/pull/3317)

- [x] [support EAGLE 2](https://github.com/sgl-project/sglang/pull/3466)

- [x] [compatible with CUDA Graph](https://github.com/sgl-project/sglang/pull/3500)

- [x] [support nextn I (single MTP head)](https://github.com/sgl-project/sglang/pull/3582)

- [x] support next II (multi MTP heads) (WIP @pankajroark )

## FlashInfer Backend

@zhyncs @yzh119 

- [x] compatible with disable MLA

- [x] support FlashInfer nightly MLA ragged prefill and CUDA Core MLA decoding

- [x] support FlashInfer v0.2.0.post3 MLA ragged, paged prefill and decoding (@zhyncs @yzh119 )

- [x] nextn parts can be shared with Triton Backend

## EAGLE 2

@zhyncs @Ying1123 

- [x] implement sampling kernel in [sgl-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel) (drop cutex) [kernel part](https://github.com/sgl-project/sglang/pull/3373), [python part](https://github.com/sgl-project/sglang/pull/3378)

- [x] bunch of fixes [non greedy fix](https://github.com/sgl-project/sglang/pull/3407), [disable cuda graph fix 1](https://github.com/sgl-project/sglang/pull/3412), [fix 2](https://github.com/sgl-project/sglang/pull/3411), [cleanup 1](https://github.com/sgl-project/sglang/pull/3415), [cleanup 2](https://github.com/sgl-project/sglang/pull/3422), [fix cuda graph capture failure](https://github.com/sgl-project/sglang/pull/3430), [fix 2](https://github.com/sgl-project/sglang/pull/3431), [reduce one draft forward](https://github.com/sgl-project/sglang/pull/3468)

- [x] compatible with radix cache and chunked prefill (WIP @Ying1123 )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Track] DeepSeek V3/R1 nextn progress #3472

Triton Backend

FlashInfer Backend

EAGLE 2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Track] DeepSeek V3/R1 nextn progress #3472

Description

Triton Backend

FlashInfer Backend

EAGLE 2

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions