Support page size > 1 #4356

merrymercy · 2025-03-12T23:19:06Z

This PR implements PagedTokenToKVPoolAllocator. It has almost the same interface as the old TokenToKVPoolAllocator, but it guarantees that the returned kv cache locations of one request are always page aligned. It is compatible with prefix cache.
Some other cleanup (e.g., move batch_is_full to ScheduleBatch)

Page size = 1, 4, 8, 16 shows no perf diff, because the page table management is all overlapped with GPU computation.
This PR will enable the integration of more attention kernels (e.g., FlashMLA and CuDNN attention)

Todos for the next PRs:

Change the attention kernel calls
Support speculative decoding

python/sglang/srt/managers/schedule_policy.py

merrymercy requested review from Ying1123, zhyncs, hnyls2002, ispobock, HaiShaw and ByronHsu as code owners March 12, 2025 23:19

Support page size > 1

3b48346

merrymercy force-pushed the lianmin/page-size branch from 7b15727 to 3b48346 Compare March 12, 2025 23:23

merrymercy added 8 commits March 12, 2025 16:40

Fix

d01094e

multi stream

e020525

Fix

7128487

Fix imports

4748ebd

Fix schedule policy

e9f7189

Fix decode_seq_lens_cpu

675b721

update

6f95c5d

update

ceb5308

yangw1234 reviewed Mar 13, 2025

View reviewed changes

python/sglang/srt/managers/schedule_policy.py Show resolved Hide resolved

merrymercy added 2 commits March 12, 2025 17:53

stack based

6ee3be0

Revert some changes

bb81342

merrymercy force-pushed the lianmin/page-size branch from fdabcd0 to bb81342 Compare March 13, 2025 01:08

merrymercy added 4 commits March 12, 2025 18:15

Fix dp attention

ad2e385

Fix int64

3595b99

update

4274d70

Fix

a8ab17f

merrymercy mentioned this pull request Mar 13, 2025

Development Roadmap (2025 H1) #4042

Open

67 tasks

merrymercy added 5 commits March 12, 2025 19:15

Fix

a05b70b

Fix cpu device

8ba2abc

Fix sync

223bafa

Fix dp attention

7d10d90

Fix fcfs

4d762c3

Fix policy

92fc23c

merrymercy merged commit c76040e into main Mar 13, 2025
45 of 57 checks passed

merrymercy deleted the lianmin/page-size branch March 13, 2025 05:22

merrymercy mentioned this pull request Mar 13, 2025

Fix a regression introduced by overlapping KV cache writing #4375

Merged

This was referenced Mar 13, 2025

[Feature] integrate FlashMLA #4384

Closed

[Feature] integrate flash-attention #4385

Closed

hebiao064 pushed a commit to hebiao064/sglang that referenced this pull request Mar 13, 2025

Support page size > 1 (sgl-project#4356)

3f8bf65

xiezhq-hermann mentioned this pull request Mar 16, 2025

Fix: Complete int32 to int64 conversion #4465

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support page size > 1 #4356

Support page size > 1 #4356

Uh oh!

merrymercy commented Mar 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Support page size > 1 #4356

Support page size > 1 #4356

Uh oh!

Conversation

merrymercy commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

merrymercy commented Mar 12, 2025 •

edited

Loading