[WIP] add vlm cache and support chunk prefill for vlm #5456

yizhang2077 · 2025-04-16T09:41:42Z

Motivation

add multimodal cache for vlm encoder to avoid repeated calculation when doing chunk prefill, we can use this cache to save encoder embedding for different request in the future.
use prefix length and extend sequence length and each multimodal item begin-end offsets to calculate vlm encode embedding in each request for chunk prefill
open chunk prefill for vlm (some old version vlm is still limited)

TODO:

need vlm: enable radix cache for qwen-vl models #5349 merge before
need adapt more vlm
need do more tests.

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

add vlm cache and support chunk prefill for vlm

5a36e09

yizhang2077 requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and xiezhq-hermann as code owners April 16, 2025 09:41

ch-wan linked an issue Apr 16, 2025 that may be closed by this pull request

[Feature] support and turn on chunked prefill by default for VLM #5250

Closed

2 tasks

yizhang2077 assigned mickqian Apr 16, 2025

yizhang2077 mentioned this pull request Apr 16, 2025

model(vlm): pixtral #5084

Merged

16 tasks

ch-wan mentioned this pull request Apr 20, 2025

Fix enable chunked prefill for Llama4 #5575

Merged

6 tasks

zhyncs closed this Apr 21, 2025

zhyncs deleted the vlm-support-chunk-prefill branch April 21, 2025 00:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] add vlm cache and support chunk prefill for vlm #5456

[WIP] add vlm cache and support chunk prefill for vlm #5456

Uh oh!

yizhang2077 commented Apr 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

[WIP] add vlm cache and support chunk prefill for vlm #5456

[WIP] add vlm cache and support chunk prefill for vlm #5456

Uh oh!

Conversation

yizhang2077 commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

TODO:

Modifications

Checklist

Uh oh!

Uh oh!

yizhang2077 commented Apr 16, 2025 •

edited

Loading