-
-
Notifications
You must be signed in to change notification settings - Fork 8.5k
Add "/server_info" endpoint in api_server to retrieve the vllm_config. #16572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Xihui Cang <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
For security reasons, this information should only be dev facing. Can you move this endpoint under the |
…uard Signed-off-by: Xihui Cang <[email protected]>
# Store global states | ||
@dataclasses.dataclass | ||
class _GlobalState: | ||
vllmconfig: VllmConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vllmconfig: VllmConfig | |
vllm_config: VllmConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also do we really need the whole vLLM config? We can avoid creating a new global state object if we can simply use model_config
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we need the whole vLLM config, we should initialize it in init_app_state
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the information provided by model_config is sometimes insufficient. We want to record and display all the parameters used when starting the vllm serve server. On one hand, this allows users to more easily understand all the server's configurations. On the other hand, it facilitates comparisons between different runs and makes it easier to fully reproduce previous experiments based on these parameters. Currently, the only way to obtain and record this information is by parsing logs, which has limitations. Moreover, if the log format changes, the parsing logic also needs to be adjusted accordingly. Thank you for your suggestions, I will try to initialize it in init_app_state.
…E is 1, then add "/server_info" endpoint in api_server. Signed-off-by: Xihui Cang <[email protected]>
Signed-off-by: Xihui Cang <[email protected]>
@router.get("/server_info") | ||
async def show_server_info(raw_request: Request): | ||
server_info = {"vllm_config": str(raw_request.app.state.vllm_config)} | ||
return JSONResponse(content=server_info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Place this at the top of the block since it's more "basic"?
…w_server_info, get_vllm_config Signed-off-by: Xihui Cang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, thanks
…. (vllm-project#16572) Signed-off-by: Xihui Cang <[email protected]> Signed-off-by: Yang Wang <[email protected]>
…. (vllm-project#16572) Signed-off-by: Xihui Cang <[email protected]>
…. (vllm-project#16572) Signed-off-by: Xihui Cang <[email protected]>
…. (vllm-project#16572) Signed-off-by: Xihui Cang <[email protected]> Signed-off-by: Mu Huai <[email protected]>
Add a server_info endpoint to allow users to directly retrieve the vllm configuration parameters without the need to parse logs..
Example APi -http://localhost:8000/server_info
{"vllm_config":"model='deepseek-ai/DeepSeek-R1-Distill-Qwen-7B', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-R1-Distill-Qwen-7B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={\"splitting_ops\":[],\"compile_sizes\":[],\"cudagraph_capture_sizes\":[],\"max_capture_size\":0}"}