-
Notifications
You must be signed in to change notification settings - Fork 233
[V1][eagle3] Support eagle3 proposer for v1 #1032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thansk for contributions, you can |
Thanks for your contributions.
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
edf2424
to
1c6ed70
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
df84c06
to
9a4371e
Compare
vllm_ascend/spec_decode/eagle_v1.py
Outdated
@@ -0,0 +1,429 @@ | |||
# SPDX-License-Identifier: Apache-2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please revise file header.
I think this file should be placed in the woker
directory instead of creating a new spec_decode
directory. @wangxiyuan
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure this UT could run in v1 engine when CI running.
The UT in the current long_term directory is run by the v0 engine by default.
You can refer to test_v1_spec_decode.py or test_v1_mtp_correctness.py.
You may also have to modify .github/workflows/vllm_ascend_test_long_term.yaml
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
da8ae03
to
33e2db5
Compare
b089e8d
to
9b95136
Compare
9b95136
to
55fa385
Compare
55fa385
to
d250d7d
Compare
@jianzs @ganyi1996ppo @Yikun please help review. |
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
d250d7d
to
b4b74c9
Compare
Signed-off-by: yuancaoyaoHW <[email protected]>
b4b74c9
to
d0aa341
Compare
I'm OK with this PR, I trend to merge this today. |
Signed-off-by: yuancaoyaoHW <[email protected]>
ccd73f2
to
be0dcb2
Compare
What this PR does / why we need it?
This PR implements the Eagle Pososer feature for vLLM v1, which enables more efficient speculative decoding by using a draft model to predict potential future tokens.
Does this PR introduce any user-facing change?
Yes, this PR introduces a new speculative decoding mode that can be enabled via configuration.
How was this patch tested?
CI passed with new unit tests added for the Eagle Pososer functionality.
Acceptance Test Script
Acceptance Test Results
Closes: #1004