-
Notifications
You must be signed in to change notification settings - Fork 237
[Bugfix][Spec Decode] Little fix to spec decode in model_runner_v1.py
#1189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
@wangxiyuan Now spec decode using Ngram drafter with V1 engine will come across errors, and this PR can fix this issue. |
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1189 +/- ##
==========================================
- Coverage 27.39% 27.22% -0.18%
==========================================
Files 56 56
Lines 6191 6215 +24
==========================================
- Hits 1696 1692 -4
- Misses 4495 4523 +28
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: shen-shanshan <[email protected]>
Signed-off-by: shen-shanshan <[email protected]>
Signed-off-by: shen-shanshan <[email protected]>
Signed-off-by: shen-shanshan <[email protected]>
Signed-off-by: shen-shanshan <[email protected]>
Signed-off-by: shen-shanshan <[email protected]>
Signed-off-by: shen-shanshan <[email protected]>
Signed-off-by: shen-shanshan <[email protected]>
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
What this PR does / why we need it?
Changes:
AscendAttentionState.SpecDecoding
.model_runner_v1.py
, sync from https://github.com/vllm-project/vllm/blob/main/vllm/v1/worker/gpu_model_runner.py#L1581-L1584._process_reqs()
to_prepare_inputs()
, keep the same with vllm upstream https://github.com/vllm-project/vllm/blob/main/vllm/v1/worker/gpu_model_runner.py#L557.Does this PR introduce any user-facing change?
How was this patch tested?
Logs: