Skip to content

Commit 87a7bac

Browse files
insukim1994dbyoung18
authored andcommitted
[Doc] Changed explanation of generation_tokens_total and prompt_tokens_total counter type metrics to avoid confusion (vllm-project#16784)
Signed-off-by: insukim1994 <[email protected]>
1 parent 0607dc6 commit 87a7bac

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

docs/source/design/v1/metrics.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,8 +66,8 @@ vLLM also provides [a reference example](https://docs.vllm.ai/en/latest/getting_
6666
The subset of metrics exposed in the Grafana dashboard gives us an indication of which metrics are especially important:
6767

6868
- `vllm:e2e_request_latency_seconds_bucket` - End to end request latency measured in seconds
69-
- `vllm:prompt_tokens_total` - Prompt Tokens/Sec
70-
- `vllm:generation_tokens_total` - Generation Tokens/Sec
69+
- `vllm:prompt_tokens_total` - Prompt Tokens
70+
- `vllm:generation_tokens_total` - Generation Tokens
7171
- `vllm:time_per_output_token_seconds` - Inter token latency (Time Per Output Token, TPOT) in second.
7272
- `vllm:time_to_first_token_seconds` - Time to First Token (TTFT) latency in seconds.
7373
- `vllm:num_requests_running` (also, `_swapped` and `_waiting`) - Number of requests in RUNNING, WAITING, and SWAPPED state

0 commit comments

Comments
 (0)