[Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens #17033

sfc-gh-zhwang · 2025-04-23T07:18:22Z

No description provided.

github-actions · 2025-04-23T07:18:32Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

youkaichao

LGTM, thanks for the clean-up!

youkaichao · 2025-04-23T07:22:37Z

initially I added it in #11031 to track batchsize distribution, but later we mainly use VLLM_LOG_BATCHSIZE_INTERVAL environment variable for this purpose.

DarkLight1337 · 2025-04-23T08:03:52Z

Please fix pre-commit

Signed-off-by: sfc-gh-zhwang <[email protected]>

DarkLight1337 · 2025-04-25T03:18:07Z

Pre-commit is still failing, can you fix it?

sfc-gh-zhwang · 2025-04-25T04:04:03Z

[2025-04-24T05:49:22Z] FAILED spec_decode/e2e/test_multistep_correctness.py::test_spec_decode_e2e_greedy_correctness_with_preemption[1-4-256-test_llm_kwargs1-baseline_llm_kwargs0-per_test_common_llm_kwargs0-common_llm_kwargs0] - AssertionError: function <function test_spec_decode_e2e_greedy_correctness_with_preemption at 0x7f4d58803880> failed when called with args () and kwargs {'vllm_runner': <class 'tests.conftest.VllmRunner'>, 'common_llm_kwargs': {'block_size': 8, 'num_gpu_blocks_override': 34, 'max_model_len': 272, 'enforce_eager': True}, 'per_test_common_llm_kwargs': {'model_name': 'JackFram/llama-160m'}, 'baseline_llm_kwargs': {}, 'test_llm_kwargs': {'speculative_config': {'model': 'JackFram/llama-68m', 'num_speculative_tokens': 5}, 'enable_chunked_prefill': True, 'max_num_batched_tokens': 4, 'max_num_seqs': 4}, 'batch_size': 4, 'output_len': 256, 'seed': 1}
--
  | [2025-04-24T05:49:22Z] FAILED spec_decode/e2e/test_multistep_correctness.py::test_spec_decode_different_block_size[1-32-2-test_llm_kwargs1-baseline_llm_kwargs0-per_test_common_llm_kwargs0-common_llm_kwargs0] - AssertionError: function <function test_spec_decode_different_block_size at 0x7f4d588039c0> failed when called with args () and kwargs {'vllm_runner': <class 'tests.conftest.VllmRunner'>, 'common_llm_kwargs': {'model_name': 'JackFram/llama-160m', 'enforce_eager': True}, 'per_test_common_llm_kwargs': {'block_size': 8}, 'baseline_llm_kwargs': {}, 'test_llm_kwargs': {'speculative_config': {'model': 'JackFram/llama-68m', 'num_speculative_tokens': 5}, 'enable_chunked_prefill': True, 'max_num_batched_tokens': 4, 'max_num_seqs': 4}, 'batch_size': 2, 'output_len': 32, 'seed': 1}

@DarkLight1337 i am curious why would this fail

DarkLight1337 · 2025-04-25T04:07:04Z

The spec decode failure is unrelated to the PR. I think you just need to fix pre-commit. Make sure you merge from main as well

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2025-04-27T10:20:28Z

I have fixed pre-commit for you

… 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens (vllm-project#17033) Signed-off-by: sfc-gh-zhwang <[email protected]>

sfc-gh-zhwang · 2025-04-29T22:03:15Z

@DarkLight1337 sorry didn't get time, thank you so much

… 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens (vllm-project#17033) Signed-off-by: sfc-gh-zhwang <[email protected]> Signed-off-by: Agata Dobrzyniewicz <[email protected]>

… 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens (vllm-project#17033) Signed-off-by: sfc-gh-zhwang <[email protected]> Signed-off-by: Mu Huai <[email protected]>

… 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens (vllm-project#17033) Signed-off-by: sfc-gh-zhwang <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>

… 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens (vllm-project#17033) Signed-off-by: sfc-gh-zhwang <[email protected]> Signed-off-by: minpeter <[email protected]>

sfc-gh-zhwang requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners April 23, 2025 07:18

mergify bot added the v1 label Apr 23, 2025

youkaichao approved these changes Apr 23, 2025

View reviewed changes

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 23, 2025

sfc-gh-zhwang force-pushed the zhwang/iteration-tokens branch 2 times, most recently from 6f622e1 to 0ea17aa Compare April 24, 2025 03:09

sfc-gh-zhwang requested review from mgoin, russellb, tlrmchlsmth and simon-mo as code owners April 24, 2025 03:09

mergify bot added documentation Improvements or additions to documentation ci/build frontend structured-output tpu Related to Google TPUs labels Apr 24, 2025

github-project-automation bot added this to Structured Output Apr 24, 2025

format

77f1047

Signed-off-by: sfc-gh-zhwang <[email protected]>

sfc-gh-zhwang force-pushed the zhwang/iteration-tokens branch from 0ea17aa to 77f1047 Compare April 24, 2025 03:23

mergify bot removed the tpu Related to Google TPUs label Apr 24, 2025

DarkLight1337 removed request for russellb and tlrmchlsmth April 24, 2025 03:28

DarkLight1337 removed request for mgoin and simon-mo April 24, 2025 03:28

Merge branch 'main' into zhwang/iteration-tokens

5819533

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 enabled auto-merge (squash) April 27, 2025 10:20

DarkLight1337 disabled auto-merge April 27, 2025 10:20

DarkLight1337 enabled auto-merge (squash) April 27, 2025 10:20

DarkLight1337 merged commit 18445ed into vllm-project:main Apr 27, 2025
48 checks passed

github-project-automation bot moved this to Done in Structured Output Apr 27, 2025

DarkLight1337 mentioned this pull request Apr 27, 2025

[Metrics] Fix minor inconsistencies in bucket progression #17262

Merged

ckhordiasma mentioned this pull request May 14, 2025

nm vllm ent 0.8.5 sync red-hat-data-services/vllm#139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens #17033

[Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens #17033

Uh oh!

sfc-gh-zhwang commented Apr 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 23, 2025

Uh oh!

youkaichao left a comment

Uh oh!

youkaichao commented Apr 23, 2025

Uh oh!

DarkLight1337 commented Apr 23, 2025

Uh oh!

DarkLight1337 commented Apr 25, 2025

Uh oh!

sfc-gh-zhwang commented Apr 25, 2025

Uh oh!

DarkLight1337 commented Apr 25, 2025

Uh oh!

DarkLight1337 commented Apr 27, 2025

Uh oh!

Uh oh!

sfc-gh-zhwang commented Apr 29, 2025

Uh oh!

Uh oh!

Uh oh!

[Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens #17033

[Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens #17033

Uh oh!

Conversation

sfc-gh-zhwang commented Apr 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 23, 2025

Uh oh!

youkaichao left a comment

Choose a reason for hiding this comment

Uh oh!

youkaichao commented Apr 23, 2025

Uh oh!

DarkLight1337 commented Apr 23, 2025

Uh oh!

DarkLight1337 commented Apr 25, 2025

Uh oh!

sfc-gh-zhwang commented Apr 25, 2025

Uh oh!

DarkLight1337 commented Apr 25, 2025

Uh oh!

DarkLight1337 commented Apr 27, 2025

Uh oh!

Uh oh!

sfc-gh-zhwang commented Apr 29, 2025

Uh oh!

Uh oh!

sfc-gh-zhwang commented Apr 23, 2025 •

edited by github-actions bot

Loading