Description
Design:
SGLang PD Disaggregation (Open Source)
Progress
-
Release initial code @ByronHsu [PD] Release initial code #4654
- prefill and decode event loop, queue, and transfer interface
- transfer engine is faked
- easy python load balancer
-
Mooncake integration @ShangmingCai https://github.com/sgl-project/sglang/pulls?q=is%3Apr+mooncake+is%3Aopen
-
NIXL Integration @trevor-m [PD] Add NIXL transfer backend #5477
-
PD + overlap schedule @ByronHsu
-
PD + fault tolerance [PD] Abort request if transfer fails #6504 [PD] Handle P/D failure and reconnect without affecting other instances #6263
-
PD + spec decode [PD] support spec decode #6507
-
PD + logprob [PD] Support logprob & Add failure test #6558
-
PD + Structured Output [PD] Support structured output #6560
-
PD + retract @Ying1123 [PD] Support decode retract and update decode.py #7196
-
PD + different TPs - call out for contribution [PD] Add support for different TP sizes per DP rank #5922 [PD] Add different TP sizes support for no-MLA models #6793
-
Rust PD Load Balancer @hnyls2002 Init PD Rust LB (PO2) #6437
-
PD + ROCm (Mooncake) @HaiShaw