Skip to content

Commit b54b5a9

Browse files
authored
[Doc]Add instruction for profiling with bench_one_batch (#5581)
1 parent bca832c commit b54b5a9

File tree

2 files changed

+7
-2
lines changed

2 files changed

+7
-2
lines changed

docs/references/benchmark_and_profiling.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,14 @@
4141
Please make sure that the `SGLANG_TORCH_PROFILER_DIR` should be set at both server and client side, otherwise the trace file cannot be generated correctly . A secure way will be setting `SGLANG_TORCH_PROFILER_DIR` in the `.*rc` file of shell (e.g. `~/.bashrc` for bash shells).
4242

4343
- To profile offline
44-
4544
```bash
4645
export SGLANG_TORCH_PROFILER_DIR=/root/sglang/profile_log
46+
47+
# profile one batch with bench_one_batch.py
48+
# batch size can be controlled with --batch argument
49+
python3 -m sglang.bench_one_batch --model-path meta-llama/Llama-3.1-8B-Instruct --batch 32 --input-len 1024 --output-len 10 --profile
50+
51+
# profile multiple batches with bench_offline_throughput.py
4752
python -m sglang.bench_offline_throughput --model-path meta-llama/Llama-3.1-8B-Instruct --dataset-name random --num-prompts 10 --profile --mem-frac=0.8
4853
```
4954

python/sglang/bench_one_batch.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -396,7 +396,7 @@ def latency_test_run_once(
396396
decode_latencies.append(latency)
397397
if i < 5:
398398
rank_print(
399-
f"Decode. latency: {latency:6.5f} s, throughput: {throughput:9.2f} token/s"
399+
f"Decode. Batch size: {batch_size}, latency: {latency:6.5f} s, throughput: {throughput:9.2f} token/s"
400400
)
401401

402402
if profile:

0 commit comments

Comments
 (0)