Skip to content

Add "/server_info" endpoint in api_server to retrieve the vllm_config.  #16572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 15, 2025
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 26 additions & 1 deletion vllm/entrypoints/openai/api_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import asyncio
import atexit
import dataclasses
import gc
import importlib
import inspect
Expand Down Expand Up @@ -30,7 +31,7 @@
from typing_extensions import assert_never

import vllm.envs as envs
from vllm.config import ModelConfig
from vllm.config import ModelConfig, VllmConfig
from vllm.engine.arg_utils import AsyncEngineArgs
from vllm.engine.async_llm_engine import AsyncLLMEngine # type: ignore
from vllm.engine.multiprocessing.client import MQLLMEngineClient
Expand Down Expand Up @@ -104,6 +105,20 @@
_running_tasks: set[asyncio.Task] = set()


# Store global states
@dataclasses.dataclass
class _GlobalState:
vllmconfig: VllmConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vllmconfig: VllmConfig
vllm_config: VllmConfig

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also do we really need the whole vLLM config? We can avoid creating a new global state object if we can simply use model_config.

Copy link
Member

@DarkLight1337 DarkLight1337 Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we need the whole vLLM config, we should initialize it in init_app_state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the information provided by model_config is sometimes insufficient. We want to record and display all the parameters used when starting the vllm serve server. On one hand, this allows users to more easily understand all the server's configurations. On the other hand, it facilitates comparisons between different runs and makes it easier to fully reproduce previous experiments based on these parameters. Currently, the only way to obtain and record this information is by parsing logs, which has limitations. Moreover, if the log format changes, the parsing logic also needs to be adjusted accordingly. Thank you for your suggestions, I will try to initialize it in init_app_state.



_global_state: Optional[_GlobalState] = None


def set_global_state(global_state: _GlobalState):
global _global_state
_global_state = global_state


@asynccontextmanager
async def lifespan(app: FastAPI):
try:
Expand Down Expand Up @@ -165,6 +180,7 @@ async def build_async_engine_client_from_engine_args(
usage_context = UsageContext.OPENAI_API_SERVER
vllm_config = engine_args.create_engine_config(usage_context=usage_context)

set_global_state(_GlobalState(vllmconfig=vllm_config))
# V1 AsyncLLM.
if envs.VLLM_USE_V1:
if disable_frontend_multiprocessing:
Expand Down Expand Up @@ -327,6 +343,7 @@ def mount_metrics(app: FastAPI):
"/load",
"/ping",
"/version",
"/server_info",
],
registry=registry,
).add().instrument(app).expose(app)
Expand Down Expand Up @@ -727,6 +744,14 @@ async def is_sleeping(raw_request: Request):
logger.info("check whether the engine is sleeping")
is_sleeping = await engine_client(raw_request).is_sleeping()
return JSONResponse(content={"is_sleeping": is_sleeping})

@router.get("/server_info")
async def show_server_info():
if _global_state is None:
server_info = {"vllm_config": "Vllm Config not available"}
else:
server_info = {"vllm_config": str(_global_state.vllmconfig)}
return JSONResponse(content=server_info)


@router.post("/invocations", dependencies=[Depends(validate_json_request)])
Expand Down
Loading