[server arg] better arg help. disable chunked prefix cache. #6991

jinmingyi1998 · 2025-06-09T07:24:53Z

disable chunked prefix cache when disable-radix-cache is True

Motivation

when disable radix prefix cache, I found chunked-prefix-cache is True. should be disabled.

related PR: #5113

Modifications

set disable_chunked_prefix_cache when radix prefix cache is disabled.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

…ache when disable radix cache

gemini-code-assist

Hello @jinmingyi1998, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request to help everyone quickly understand the changes. This PR primarily addresses an issue where the chunked prefix cache was not being automatically disabled when the radix prefix cache was turned off. It also includes a minor update to the help text for a command-line argument.

The main motivation, as described by the author, is to ensure consistency in caching behavior when the radix cache is explicitly disabled. If the radix cache is off, the chunked prefix cache should also be off to avoid unexpected behavior or overhead, particularly for certain model types like Deepseek.

Highlights

Cache Dependency Logic: Added logic in the model runner to automatically disable the chunked prefix cache (--disable-chunked-prefix-cache) if the radix prefix cache (--disable-radix-cache) is enabled.
Argument Help Text Update: Slightly reworded the help text for the --disable-chunked-prefix-cache command-line argument for clarity.

Changelog

python/sglang/srt/model_executor/model_runner.py
- Introduced a check (if server_args.disable_radix_cache:) around line 409 to set server_args.disable_chunked_prefix_cache = True if the radix cache is disabled. This logic is placed before the existing checks related to the MLA backend and page size.
python/sglang/srt/server_args.py
- Updated the help string for the --disable-chunked-prefix-cache argument on line 1414, changing it from "Disable chunked prefix cache feature for deepseek, which should save overhead for short sequences." to "For Deepseek, Disable chunked-prefix-cache to save overhead for short sequences."

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request correctly ensures that the chunked prefix cache is disabled when the radix cache is disabled, which improves configuration consistency. The update to the help string for --disable-chunked-prefix-cache also aims to clarify its usage, particularly for Deepseek models. The logical change in model_runner.py is sound and well-placed. Overall, the changes are clear and address the stated motivation effectively.

Summary of Findings

Help string phrasing for --disable-chunked-prefix-cache: In python/sglang/srt/server_args.py, the updated help string For Deepseek, Disable chunked-prefix-cache... is functional but could be phrased more clearly. Consider alternatives like For Deepseek models, disable the chunked prefix cache... or Disable chunked prefix cache (primarily for Deepseek models)... for improved readability. This is a low-severity stylistic suggestion and was not added as a direct review comment due to the project's review settings for comment severity.

Merge Readiness

The pull request appears to be in good shape. The core logic change correctly addresses the issue of ensuring the chunked prefix cache is disabled when the radix cache is disabled. The help string update is a minor clarification. I have not found any critical or high-severity issues. Based on this review, the PR seems ready for merging, pending any further review or testing procedures by the maintainers. As an AI, I am not authorized to approve pull requests.

Fridge003 · 2025-06-09T07:36:35Z

python/sglang/srt/model_executor/model_runner.py

@@ -406,8 +406,9 @@ def model_specific_adjustment(self):
                    f"Automatically turn of --chunked-prefill-size as it is not supported for "
                    f"{self.model_config.hf_config.model_type}"
                )
-
-        if not self.use_mla_backend:
+        if server_args.disable_radix_cache:


This is not true.. Chunked prefix cache can be used when radix cache is disabled.

but why? there is no prefix cache.

This is not true.. Chunked prefix cache can be used when radix cache is disabled.

when radix cache is disabled, is there any prefix cache?

Here prefix cache means kv cache, which is different from radix cache.

Here prefix cache means kv cache, which is different from radix cache.

if ( forward_batch.forward_mode.is_extend() and not self.disable_chunked_prefix_cache and not forward_batch.forward_mode.is_target_verify() and not forward_batch.forward_mode.is_draft_extend() and ( sum_extend_prefix_lens >= self.chunked_prefix_cache_threshold or sum_extend_prefix_lens == 0 ) ): return AttnForwardMethod.MHA_CHUNKED_KV

but here use when forward_mode.is_extend()
no kv cache if disable radix-cache.
or you mean there is any other way to prefix caching?

KV cache will still be used if disabling radix-cache. It's just not managed by radix tree, and there will be many repeatedly computed tokens.

feat(chunked-prefix-cache): better arg help. disable chunked prefix c…

1fb733c

…ache when disable radix cache

jinmingyi1998 requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners June 9, 2025 07:24

Merge branch 'main' into disable_chunked_prefix_cache_arg

c2427a5

gemini-code-assist bot reviewed Jun 9, 2025

View reviewed changes

Fridge003 reviewed Jun 9, 2025

View reviewed changes

Fridge003 closed this Jun 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[server arg] better arg help. disable chunked prefix cache. #6991

[server arg] better arg help. disable chunked prefix cache. #6991

Uh oh!

jinmingyi1998 commented Jun 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Fridge003 Jun 9, 2025

Uh oh!

jinmingyi1998 Jun 9, 2025

Uh oh!

jinmingyi1998 Jun 9, 2025

Uh oh!

Fridge003 Jun 9, 2025

Uh oh!

jinmingyi1998 Jun 9, 2025

Uh oh!

Fridge003 Jun 9, 2025

Uh oh!

Uh oh!

[server arg] better arg help. disable chunked prefix cache. #6991

[server arg] better arg help. disable chunked prefix cache. #6991

Uh oh!

Conversation

jinmingyi1998 commented Jun 9, 2025

Motivation

Modifications

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

Fridge003 Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

jinmingyi1998 Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

jinmingyi1998 Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

Fridge003 Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

jinmingyi1998 Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

Fridge003 Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!