[Bugfix][Spec Decode] Little fix to spec decode in `model_runner_v1.py` #1189

shen-shanshan · 2025-06-12T08:47:20Z

What this PR does / why we need it?

Changes:

Bugfix for [Bug]: test_ngram_correctness failed due to PagedAttentionOperation inner error #1162 by removing logic about AscendAttentionState.SpecDecoding.
Little fix to spec decode in model_runner_v1.py, sync from https://github.com/vllm-project/vllm/blob/main/vllm/v1/worker/gpu_model_runner.py#L1581-L1584.
Rename _process_reqs() to _prepare_inputs(), keep the same with vllm upstream https://github.com/vllm-project/vllm/blob/main/vllm/v1/worker/gpu_model_runner.py#L557.

Does this PR introduce any user-facing change?

How was this patch tested?

pytest -sv tests/long_term/spec_decode/e2e/test_v1_spec_decode.py::test_ngram_correctness

Logs:

pytest -sv tests/long_term/spec_decode/e2e/test_v1_spec_decode.py::test_ngram_correctness
INFO 06-13 06:53:06 [importing.py:17] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 06-13 06:53:06 [importing.py:29] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
INFO 06-13 06:53:10 [__init__.py:39] Available plugins for group vllm.platform_plugins:
INFO 06-13 06:53:10 [__init__.py:41] - ascend -> vllm_ascend:register
INFO 06-13 06:53:10 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 06-13 06:53:10 [__init__.py:235] Platform plugin ascend is activated
WARNING:root:Failed to import 'vllm_ascend.vllm_ascend_C': dynamic module does not define module export function (PyInit_vllm_ascend_C). All custom ops will be disabled. 
WARNING 06-13 06:53:15 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/home/sss/software/miniconda3/envs/vllm-v1/lib/python3.10/site-packages/pytest_asyncio/plugin.py:217: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
========================================================================== test session starts ===========================================================================
platform linux -- Python 3.10.16, pytest-8.3.5, pluggy-1.5.0 -- /home/sss/software/miniconda3/envs/vllm-v1/bin/python
cachedir: .pytest_cache
rootdir: /home/sss/github/vllm-project/vllm-ascend
configfile: pytest.ini
plugins: anyio-4.8.0, asyncio-0.26.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 1 item                                                                                                                                                         

tests/long_term/spec_decode/e2e/test_v1_spec_decode.py::test_ngram_correctness WARNING 06-13 06:53:18 [registry.py:402] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 06-13 06:53:18 [registry.py:402] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 06-13 06:53:18 [registry.py:402] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 06-13 06:53:18 [registry.py:402] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 06-13 06:53:18 [registry.py:402] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 06-13 06:53:18 [registry.py:402] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:53:19,232 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
2025-06-13 06:53:19,511 - modelscope - INFO - Target directory already exists, skipping creation.
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:53:19,780 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
2025-06-13 06:53:20,035 - modelscope - INFO - Target directory already exists, skipping creation.
INFO 06-13 06:53:40 [config.py:831] This model supports multiple tasks: {'score', 'reward', 'embed', 'classify', 'generate'}. Defaulting to 'generate'.
WARNING 06-13 06:53:40 [arg_utils.py:1652] Detected VLLM_USE_V1=1 with npu. Usage should be considered experimental. Please report any issues on Github.
INFO 06-13 06:53:40 [config.py:1988] Disabled the custom all-reduce kernel because it is not supported on current platform.
INFO 06-13 06:53:40 [config.py:2203] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 06-13 06:53:40 [platform.py:170] Compilation disabled, using eager mode by default
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:53:41,336 - modelscope - INFO - Target directory already exists, skipping creation.
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:53:42,374 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
2025-06-13 06:53:42,633 - modelscope - INFO - Target directory already exists, skipping creation.
INFO 06-13 06:53:48 [importing.py:17] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 06-13 06:53:48 [importing.py:29] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
INFO 06-13 06:53:52 [__init__.py:39] Available plugins for group vllm.platform_plugins:
INFO 06-13 06:53:52 [__init__.py:41] - ascend -> vllm_ascend:register
INFO 06-13 06:53:52 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 06-13 06:53:52 [__init__.py:235] Platform plugin ascend is activated
WARNING:root:Failed to import 'vllm_ascend.vllm_ascend_C': dynamic module does not define module export function (PyInit_vllm_ascend_C). All custom ops will be disabled. 
WARNING 06-13 06:53:56 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
INFO 06-13 06:53:59 [core.py:455] Waiting for init message from front-end.
INFO 06-13 06:53:59 [platform.py:170] Compilation disabled, using eager mode by default
WARNING 06-13 06:53:59 [registry.py:402] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 06-13 06:53:59 [registry.py:402] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 06-13 06:53:59 [registry.py:402] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 06-13 06:53:59 [registry.py:402] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 06-13 06:53:59 [registry.py:402] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 06-13 06:53:59 [registry.py:402] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
INFO 06-13 06:53:59 [core.py:70] Initializing a V1 LLM engine (v0.8.5.dev20+gb590adfdc) with config: model='LLM-Research/Meta-Llama-3.1-8B-Instruct', speculative_config=None, tokenizer='LLM-Research/Meta-Llama-3.1-8B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto,  device_config=npu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=LLM-Research/Meta-Llama-3.1-8B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":0,"local_cache_dir":null}
WARNING 06-13 06:53:59 [camem.py:63] Failed to import vllm_ascend_C:dynamic module does not define module export function (PyInit_vllm_ascend_C). Sleep mode will be disabled. 
WARNING 06-13 06:54:00 [utils.py:2746] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm_ascend.worker.worker_v1.NPUWorker object at 0xfffd1dc6f340>
INFO 06-13 06:54:08 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO 06-13 06:54:09 [model_runner_v1.py:1532] Starting to load model LLM-Research/Meta-Llama-3.1-8B-Instruct...
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:54:10,495 - modelscope - INFO - Target directory already exists, skipping creation.
Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  6.49it/s]
Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:00<00:00,  2.11it/s]
Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:01<00:00,  1.49it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.26it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.46it/s]

INFO 06-13 06:54:13 [default_loader.py:272] Loading weights took 3.04 seconds
INFO 06-13 06:54:14 [model_runner_v1.py:1545] Loading model weights took 14.9929 GB
INFO 06-13 06:54:32 [kv_cache_utils.py:716] GPU KV cache size: 312,320 tokens
INFO 06-13 06:54:32 [kv_cache_utils.py:720] Maximum concurrency for 1,024 tokens per request: 305.00x
INFO 06-13 06:54:33 [core.py:171] init engine (profile, create kv cache, warmup model) took 18.65 seconds
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:54:33,577 - modelscope - INFO - Target directory already exists, skipping creation.
INFO 06-13 06:54:35 [chat_utils.py:420] Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this.
Adding requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 3637.38it/s]
Processed prompts: 100%|███████████████████████████████████████████████████████| 100/100 [00:01<00:00, 68.66it/s, est. speed input: 4648.56 toks/s, output: 686.63 toks/s]
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:54:42,195 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
2025-06-13 06:54:42,449 - modelscope - INFO - Target directory already exists, skipping creation.
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:54:42,701 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
2025-06-13 06:54:42,936 - modelscope - INFO - Target directory already exists, skipping creation.
INFO 06-13 06:54:42 [config.py:831] This model supports multiple tasks: {'score', 'reward', 'embed', 'classify', 'generate'}. Defaulting to 'generate'.
WARNING 06-13 06:54:43 [arg_utils.py:1652] Detected VLLM_USE_V1=1 with npu. Usage should be considered experimental. Please report any issues on Github.
INFO 06-13 06:54:43 [config.py:1988] Disabled the custom all-reduce kernel because it is not supported on current platform.
INFO 06-13 06:54:43 [config.py:2203] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 06-13 06:54:43 [platform.py:170] Compilation disabled, using eager mode by default
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:54:43,728 - modelscope - INFO - Target directory already exists, skipping creation.
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:54:44,824 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
2025-06-13 06:54:45,058 - modelscope - INFO - Target directory already exists, skipping creation.
INFO 06-13 06:54:51 [importing.py:17] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 06-13 06:54:51 [importing.py:29] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
INFO 06-13 06:54:54 [__init__.py:39] Available plugins for group vllm.platform_plugins:
INFO 06-13 06:54:54 [__init__.py:41] - ascend -> vllm_ascend:register
INFO 06-13 06:54:54 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 06-13 06:54:54 [__init__.py:235] Platform plugin ascend is activated
WARNING:root:Failed to import 'vllm_ascend.vllm_ascend_C': dynamic module does not define module export function (PyInit_vllm_ascend_C). All custom ops will be disabled. 
WARNING 06-13 06:54:59 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
INFO 06-13 06:55:01 [core.py:455] Waiting for init message from front-end.
INFO 06-13 06:55:01 [platform.py:170] Compilation disabled, using eager mode by default
WARNING 06-13 06:55:02 [registry.py:402] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 06-13 06:55:02 [registry.py:402] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 06-13 06:55:02 [registry.py:402] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 06-13 06:55:02 [registry.py:402] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 06-13 06:55:02 [registry.py:402] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 06-13 06:55:02 [registry.py:402] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
INFO 06-13 06:55:02 [core.py:70] Initializing a V1 LLM engine (v0.8.5.dev20+gb590adfdc) with config: model='LLM-Research/Meta-Llama-3.1-8B-Instruct', speculative_config=SpeculativeConfig(method='ngram', model=None, num_spec_tokens=3), tokenizer='LLM-Research/Meta-Llama-3.1-8B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto,  device_config=npu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=LLM-Research/Meta-Llama-3.1-8B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":0,"local_cache_dir":null}
WARNING 06-13 06:55:02 [camem.py:63] Failed to import vllm_ascend_C:dynamic module does not define module export function (PyInit_vllm_ascend_C). Sleep mode will be disabled. 
WARNING 06-13 06:55:02 [utils.py:2746] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm_ascend.worker.worker_v1.NPUWorker object at 0xfffcf23cf2e0>
INFO 06-13 06:55:11 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO 06-13 06:55:12 [model_runner_v1.py:1532] Starting to load model LLM-Research/Meta-Llama-3.1-8B-Instruct...
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:55:13,348 - modelscope - INFO - Target directory already exists, skipping creation.
Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  7.22it/s]
Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:00<00:00,  2.17it/s]
Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:01<00:00,  1.48it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.25it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.46it/s]

INFO 06-13 06:55:16 [default_loader.py:272] Loading weights took 3.19 seconds
INFO 06-13 06:55:16 [model_runner_v1.py:1537] Loading drafter model...
INFO 06-13 06:55:17 [model_runner_v1.py:1545] Loading model weights took 14.9929 GB
INFO 06-13 06:55:33 [kv_cache_utils.py:716] GPU KV cache size: 312,192 tokens
INFO 06-13 06:55:33 [kv_cache_utils.py:720] Maximum concurrency for 1,024 tokens per request: 304.88x
INFO 06-13 06:55:33 [core.py:171] init engine (profile, create kv cache, warmup model) took 15.65 seconds
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:55:33,785 - modelscope - INFO - Target directory already exists, skipping creation.
Adding requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 3860.49it/s]
Processed prompts: 100%|███████████████████████████████████████████████████████| 100/100 [00:02<00:00, 38.76it/s, est. speed input: 2623.90 toks/s, output: 387.57 toks/s]
PASSED

============================================================================ warnings summary ============================================================================
../../../software/miniconda3/envs/vllm-v1/lib/python3.10/site-packages/torch_npu/dynamo/torchair/__init__.py:8
  /home/sss/software/miniconda3/envs/vllm-v1/lib/python3.10/site-packages/torch_npu/dynamo/torchair/__init__.py:8: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    import pkg_resources

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================ 1 passed, 1 warning in 143.94s (0:02:23) ================================================================

github-actions · 2025-06-16T10:34:45Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

shen-shanshan · 2025-06-18T14:43:33Z

@wangxiyuan Now spec decode using Ngram drafter with V1 engine will come across errors, and this PR can fix this issue.

github-actions · 2025-06-20T09:22:50Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

codecov · 2025-06-23T12:23:39Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 27.22%. Comparing base (c30ddb8) to head (5c17cb5).
Report is 83 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1189      +/-   ##
==========================================
- Coverage   27.39%   27.22%   -0.18%     
==========================================
  Files          56       56              
  Lines        6191     6215      +24     
==========================================
- Hits         1696     1692       -4     
- Misses       4495     4523      +28

Flag	Coverage Δ
unittests	`27.22% <100.00%> (-0.18%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-06-25T08:25:42Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-06-25T09:25:09Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: shen-shanshan <[email protected]>

github-actions · 2025-06-30T08:36:49Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

shen-shanshan marked this pull request as draft June 12, 2025 08:58

shen-shanshan marked this pull request as ready for review June 12, 2025 09:00

shen-shanshan marked this pull request as draft June 12, 2025 11:50

shen-shanshan marked this pull request as ready for review June 13, 2025 04:02

shen-shanshan mentioned this pull request Jun 13, 2025

[Bug]: test_ngram_correctness failed due to PagedAttentionOperation inner error #1162

Open

shen-shanshan marked this pull request as draft June 13, 2025 04:35

shen-shanshan marked this pull request as ready for review June 13, 2025 07:25

shen-shanshan force-pushed the sd branch from ffe77fb to a14f5c8 Compare June 16, 2025 08:49

github-actions bot added the merge-conflicts label Jun 16, 2025

shen-shanshan force-pushed the sd branch from 43711a5 to adca8f9 Compare June 17, 2025 07:46

github-actions bot removed the merge-conflicts label Jun 17, 2025

shen-shanshan requested a review from wangxiyuan June 18, 2025 14:43

MengqingCao mentioned this pull request Jun 20, 2025

[CI/UT][Refactor] move e2e spec decode and deepseek acc test to per pr #1136

Merged

MengqingCao added long-term-test enable long term test for PR ready-for-test start test by label for PR labels Jun 20, 2025

github-actions bot added the merge-conflicts label Jun 20, 2025

shen-shanshan force-pushed the sd branch from adca8f9 to 56b82ec Compare June 23, 2025 12:00

shen-shanshan removed the merge-conflicts label Jun 23, 2025

github-actions bot added the merge-conflicts label Jun 25, 2025

shen-shanshan removed the merge-conflicts label Jun 25, 2025

github-actions bot added the merge-conflicts label Jun 25, 2025

shen-shanshan added 3 commits June 25, 2025 09:32

little fix to spec decode in model runner

252616d

Signed-off-by: shen-shanshan <[email protected]>

fix ngram bug

811f551

Signed-off-by: shen-shanshan <[email protected]>

update

547d8f1

Signed-off-by: shen-shanshan <[email protected]>

shen-shanshan added 5 commits June 25, 2025 09:33

update

08b4f8e

Signed-off-by: shen-shanshan <[email protected]>

update

f50bf1d

Signed-off-by: shen-shanshan <[email protected]>

rebase

fecf872

Signed-off-by: shen-shanshan <[email protected]>

update

8bbe5da

Signed-off-by: shen-shanshan <[email protected]>

update

5c17cb5

Signed-off-by: shen-shanshan <[email protected]>

shen-shanshan force-pushed the sd branch from 79c7ae2 to 5c17cb5 Compare June 25, 2025 09:33

github-actions bot added merge-conflicts and removed merge-conflicts labels Jun 25, 2025

shen-shanshan closed this Jul 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][Spec Decode] Little fix to spec decode in `model_runner_v1.py` #1189

[Bugfix][Spec Decode] Little fix to spec decode in `model_runner_v1.py` #1189

Uh oh!

shen-shanshan commented Jun 12, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 16, 2025

Uh oh!

shen-shanshan commented Jun 18, 2025

Uh oh!

github-actions bot commented Jun 20, 2025

Uh oh!

codecov bot commented Jun 23, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

Uh oh!

[Bugfix][Spec Decode] Little fix to spec decode in model_runner_v1.py #1189

[Bugfix][Spec Decode] Little fix to spec decode in model_runner_v1.py #1189

Uh oh!

Conversation

shen-shanshan commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Jun 16, 2025

Uh oh!

shen-shanshan commented Jun 18, 2025

Uh oh!

github-actions bot commented Jun 20, 2025

Uh oh!

codecov bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

Uh oh!

[Bugfix][Spec Decode] Little fix to spec decode in `model_runner_v1.py` #1189

[Bugfix][Spec Decode] Little fix to spec decode in `model_runner_v1.py` #1189

shen-shanshan commented Jun 12, 2025 •

edited

Loading

codecov bot commented Jun 23, 2025 •

edited

Loading