Skip to content

[Bugfix][Spec Decode] Little fix to spec decode in model_runner_v1.py #1189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

shen-shanshan
Copy link
Collaborator

@shen-shanshan shen-shanshan commented Jun 12, 2025

What this PR does / why we need it?

Changes:

  1. Bugfix for [Bug]: test_ngram_correctness failed due to PagedAttentionOperation inner error #1162 by removing logic about AscendAttentionState.SpecDecoding.
  2. Little fix to spec decode in model_runner_v1.py, sync from https://github.com/vllm-project/vllm/blob/main/vllm/v1/worker/gpu_model_runner.py#L1581-L1584.
  3. Rename _process_reqs() to _prepare_inputs(), keep the same with vllm upstream https://github.com/vllm-project/vllm/blob/main/vllm/v1/worker/gpu_model_runner.py#L557.

Does this PR introduce any user-facing change?

How was this patch tested?

pytest -sv tests/long_term/spec_decode/e2e/test_v1_spec_decode.py::test_ngram_correctness

Logs:

pytest -sv tests/long_term/spec_decode/e2e/test_v1_spec_decode.py::test_ngram_correctness
INFO 06-13 06:53:06 [importing.py:17] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 06-13 06:53:06 [importing.py:29] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
INFO 06-13 06:53:10 [__init__.py:39] Available plugins for group vllm.platform_plugins:
INFO 06-13 06:53:10 [__init__.py:41] - ascend -> vllm_ascend:register
INFO 06-13 06:53:10 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 06-13 06:53:10 [__init__.py:235] Platform plugin ascend is activated
WARNING:root:Failed to import 'vllm_ascend.vllm_ascend_C': dynamic module does not define module export function (PyInit_vllm_ascend_C). All custom ops will be disabled. 
WARNING 06-13 06:53:15 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
/home/sss/software/miniconda3/envs/vllm-v1/lib/python3.10/site-packages/pytest_asyncio/plugin.py:217: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
========================================================================== test session starts ===========================================================================
platform linux -- Python 3.10.16, pytest-8.3.5, pluggy-1.5.0 -- /home/sss/software/miniconda3/envs/vllm-v1/bin/python
cachedir: .pytest_cache
rootdir: /home/sss/github/vllm-project/vllm-ascend
configfile: pytest.ini
plugins: anyio-4.8.0, asyncio-0.26.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 1 item                                                                                                                                                         

tests/long_term/spec_decode/e2e/test_v1_spec_decode.py::test_ngram_correctness WARNING 06-13 06:53:18 [registry.py:402] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 06-13 06:53:18 [registry.py:402] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 06-13 06:53:18 [registry.py:402] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 06-13 06:53:18 [registry.py:402] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 06-13 06:53:18 [registry.py:402] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 06-13 06:53:18 [registry.py:402] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:53:19,232 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
2025-06-13 06:53:19,511 - modelscope - INFO - Target directory already exists, skipping creation.
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:53:19,780 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
2025-06-13 06:53:20,035 - modelscope - INFO - Target directory already exists, skipping creation.
INFO 06-13 06:53:40 [config.py:831] This model supports multiple tasks: {'score', 'reward', 'embed', 'classify', 'generate'}. Defaulting to 'generate'.
WARNING 06-13 06:53:40 [arg_utils.py:1652] Detected VLLM_USE_V1=1 with npu. Usage should be considered experimental. Please report any issues on Github.
INFO 06-13 06:53:40 [config.py:1988] Disabled the custom all-reduce kernel because it is not supported on current platform.
INFO 06-13 06:53:40 [config.py:2203] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 06-13 06:53:40 [platform.py:170] Compilation disabled, using eager mode by default
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:53:41,336 - modelscope - INFO - Target directory already exists, skipping creation.
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:53:42,374 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
2025-06-13 06:53:42,633 - modelscope - INFO - Target directory already exists, skipping creation.
INFO 06-13 06:53:48 [importing.py:17] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 06-13 06:53:48 [importing.py:29] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
INFO 06-13 06:53:52 [__init__.py:39] Available plugins for group vllm.platform_plugins:
INFO 06-13 06:53:52 [__init__.py:41] - ascend -> vllm_ascend:register
INFO 06-13 06:53:52 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 06-13 06:53:52 [__init__.py:235] Platform plugin ascend is activated
WARNING:root:Failed to import 'vllm_ascend.vllm_ascend_C': dynamic module does not define module export function (PyInit_vllm_ascend_C). All custom ops will be disabled. 
WARNING 06-13 06:53:56 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
INFO 06-13 06:53:59 [core.py:455] Waiting for init message from front-end.
INFO 06-13 06:53:59 [platform.py:170] Compilation disabled, using eager mode by default
WARNING 06-13 06:53:59 [registry.py:402] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 06-13 06:53:59 [registry.py:402] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 06-13 06:53:59 [registry.py:402] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 06-13 06:53:59 [registry.py:402] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 06-13 06:53:59 [registry.py:402] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 06-13 06:53:59 [registry.py:402] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
INFO 06-13 06:53:59 [core.py:70] Initializing a V1 LLM engine (v0.8.5.dev20+gb590adfdc) with config: model='LLM-Research/Meta-Llama-3.1-8B-Instruct', speculative_config=None, tokenizer='LLM-Research/Meta-Llama-3.1-8B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto,  device_config=npu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=LLM-Research/Meta-Llama-3.1-8B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":0,"local_cache_dir":null}
WARNING 06-13 06:53:59 [camem.py:63] Failed to import vllm_ascend_C:dynamic module does not define module export function (PyInit_vllm_ascend_C). Sleep mode will be disabled. 
WARNING 06-13 06:54:00 [utils.py:2746] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm_ascend.worker.worker_v1.NPUWorker object at 0xfffd1dc6f340>
INFO 06-13 06:54:08 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO 06-13 06:54:09 [model_runner_v1.py:1532] Starting to load model LLM-Research/Meta-Llama-3.1-8B-Instruct...
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:54:10,495 - modelscope - INFO - Target directory already exists, skipping creation.
Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  6.49it/s]
Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:00<00:00,  2.11it/s]
Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:01<00:00,  1.49it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.26it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.46it/s]

INFO 06-13 06:54:13 [default_loader.py:272] Loading weights took 3.04 seconds
INFO 06-13 06:54:14 [model_runner_v1.py:1545] Loading model weights took 14.9929 GB
INFO 06-13 06:54:32 [kv_cache_utils.py:716] GPU KV cache size: 312,320 tokens
INFO 06-13 06:54:32 [kv_cache_utils.py:720] Maximum concurrency for 1,024 tokens per request: 305.00x
INFO 06-13 06:54:33 [core.py:171] init engine (profile, create kv cache, warmup model) took 18.65 seconds
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:54:33,577 - modelscope - INFO - Target directory already exists, skipping creation.
INFO 06-13 06:54:35 [chat_utils.py:420] Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this.
Adding requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 3637.38it/s]
Processed prompts: 100%|███████████████████████████████████████████████████████| 100/100 [00:01<00:00, 68.66it/s, est. speed input: 4648.56 toks/s, output: 686.63 toks/s]
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:54:42,195 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
2025-06-13 06:54:42,449 - modelscope - INFO - Target directory already exists, skipping creation.
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:54:42,701 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
2025-06-13 06:54:42,936 - modelscope - INFO - Target directory already exists, skipping creation.
INFO 06-13 06:54:42 [config.py:831] This model supports multiple tasks: {'score', 'reward', 'embed', 'classify', 'generate'}. Defaulting to 'generate'.
WARNING 06-13 06:54:43 [arg_utils.py:1652] Detected VLLM_USE_V1=1 with npu. Usage should be considered experimental. Please report any issues on Github.
INFO 06-13 06:54:43 [config.py:1988] Disabled the custom all-reduce kernel because it is not supported on current platform.
INFO 06-13 06:54:43 [config.py:2203] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 06-13 06:54:43 [platform.py:170] Compilation disabled, using eager mode by default
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:54:43,728 - modelscope - INFO - Target directory already exists, skipping creation.
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:54:44,824 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
2025-06-13 06:54:45,058 - modelscope - INFO - Target directory already exists, skipping creation.
INFO 06-13 06:54:51 [importing.py:17] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 06-13 06:54:51 [importing.py:29] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
INFO 06-13 06:54:54 [__init__.py:39] Available plugins for group vllm.platform_plugins:
INFO 06-13 06:54:54 [__init__.py:41] - ascend -> vllm_ascend:register
INFO 06-13 06:54:54 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 06-13 06:54:54 [__init__.py:235] Platform plugin ascend is activated
WARNING:root:Failed to import 'vllm_ascend.vllm_ascend_C': dynamic module does not define module export function (PyInit_vllm_ascend_C). All custom ops will be disabled. 
WARNING 06-13 06:54:59 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
INFO 06-13 06:55:01 [core.py:455] Waiting for init message from front-end.
INFO 06-13 06:55:01 [platform.py:170] Compilation disabled, using eager mode by default
WARNING 06-13 06:55:02 [registry.py:402] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 06-13 06:55:02 [registry.py:402] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 06-13 06:55:02 [registry.py:402] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 06-13 06:55:02 [registry.py:402] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 06-13 06:55:02 [registry.py:402] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 06-13 06:55:02 [registry.py:402] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
INFO 06-13 06:55:02 [core.py:70] Initializing a V1 LLM engine (v0.8.5.dev20+gb590adfdc) with config: model='LLM-Research/Meta-Llama-3.1-8B-Instruct', speculative_config=SpeculativeConfig(method='ngram', model=None, num_spec_tokens=3), tokenizer='LLM-Research/Meta-Llama-3.1-8B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto,  device_config=npu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=LLM-Research/Meta-Llama-3.1-8B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":0,"local_cache_dir":null}
WARNING 06-13 06:55:02 [camem.py:63] Failed to import vllm_ascend_C:dynamic module does not define module export function (PyInit_vllm_ascend_C). Sleep mode will be disabled. 
WARNING 06-13 06:55:02 [utils.py:2746] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm_ascend.worker.worker_v1.NPUWorker object at 0xfffcf23cf2e0>
INFO 06-13 06:55:11 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO 06-13 06:55:12 [model_runner_v1.py:1532] Starting to load model LLM-Research/Meta-Llama-3.1-8B-Instruct...
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:55:13,348 - modelscope - INFO - Target directory already exists, skipping creation.
Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  7.22it/s]
Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:00<00:00,  2.17it/s]
Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:01<00:00,  1.48it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.25it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.46it/s]

INFO 06-13 06:55:16 [default_loader.py:272] Loading weights took 3.19 seconds
INFO 06-13 06:55:16 [model_runner_v1.py:1537] Loading drafter model...
INFO 06-13 06:55:17 [model_runner_v1.py:1545] Loading model weights took 14.9929 GB
INFO 06-13 06:55:33 [kv_cache_utils.py:716] GPU KV cache size: 312,192 tokens
INFO 06-13 06:55:33 [kv_cache_utils.py:720] Maximum concurrency for 1,024 tokens per request: 304.88x
INFO 06-13 06:55:33 [core.py:171] init engine (profile, create kv cache, warmup model) took 15.65 seconds
Downloading Model to directory: /home/sss/.cache/modelscope/hub/models/LLM-Research/Meta-Llama-3.1-8B-Instruct
2025-06-13 06:55:33,785 - modelscope - INFO - Target directory already exists, skipping creation.
Adding requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 3860.49it/s]
Processed prompts: 100%|███████████████████████████████████████████████████████| 100/100 [00:02<00:00, 38.76it/s, est. speed input: 2623.90 toks/s, output: 387.57 toks/s]
PASSED

============================================================================ warnings summary ============================================================================
../../../software/miniconda3/envs/vllm-v1/lib/python3.10/site-packages/torch_npu/dynamo/torchair/__init__.py:8
  /home/sss/software/miniconda3/envs/vllm-v1/lib/python3.10/site-packages/torch_npu/dynamo/torchair/__init__.py:8: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    import pkg_resources

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================ 1 passed, 1 warning in 143.94s (0:02:23) ================================================================

@shen-shanshan shen-shanshan marked this pull request as draft June 12, 2025 08:58
@shen-shanshan shen-shanshan marked this pull request as ready for review June 12, 2025 09:00
@shen-shanshan shen-shanshan marked this pull request as draft June 12, 2025 11:50
@shen-shanshan shen-shanshan marked this pull request as ready for review June 13, 2025 04:02
@shen-shanshan shen-shanshan marked this pull request as draft June 13, 2025 04:35
@shen-shanshan shen-shanshan marked this pull request as ready for review June 13, 2025 07:25
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@shen-shanshan
Copy link
Collaborator Author

@wangxiyuan Now spec decode using Ngram drafter with V1 engine will come across errors, and this PR can fix this issue.

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link

codecov bot commented Jun 23, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 27.22%. Comparing base (c30ddb8) to head (5c17cb5).
Report is 83 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1189      +/-   ##
==========================================
- Coverage   27.39%   27.22%   -0.18%     
==========================================
  Files          56       56              
  Lines        6191     6215      +24     
==========================================
- Hits         1696     1692       -4     
- Misses       4495     4523      +28     
Flag Coverage Δ
unittests 27.22% <100.00%> (-0.18%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: shen-shanshan <[email protected]>
Signed-off-by: shen-shanshan <[email protected]>
Signed-off-by: shen-shanshan <[email protected]>
Signed-off-by: shen-shanshan <[email protected]>
Signed-off-by: shen-shanshan <[email protected]>
Signed-off-by: shen-shanshan <[email protected]>
Signed-off-by: shen-shanshan <[email protected]>
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
long-term-test enable long term test for PR merge-conflicts ready-for-test start test by label for PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants