[V1][eagle3] Support eagle3 proposer for v1 #1032

yuancaoyaoHW · 2025-05-30T07:15:50Z

What this PR does / why we need it?

This PR implements the Eagle Pososer feature for vLLM v1, which enables more efficient speculative decoding by using a draft model to predict potential future tokens.

The implementation includes the core Eagle algorithm integration with vLLM's existing architecture, allowing for faster inference while maintaining output quality.
This is needed to significantly improve the generation speed of large language models without compromising on the quality of generated text.

Does this PR introduce any user-facing change?

Yes, this PR introduces a new speculative decoding mode that can be enabled via configuration.

Users can now choose to use Eagle Pososer by setting appropriate flags in the inference configuration.
The API remains backward compatible, with the new functionality being opt-in.

How was this patch tested?

CI passed with new unit tests added for the Eagle Pososer functionality.

Benchmark tests were conducted comparing generation speed and quality with and without Eagle Pososer.
Integration tests were performed with various model architectures to ensure compatibility.
Manual testing was done using different prompt scenarios to verify output quality remains consistent.
we test accept rate on one Ascend 910B npu, The acceptance rate results are basically consistent with those shown here: [V1][Spec Decode] EAGLE-3 Support vllm#16937
Currently, we support scenarios where num_spec_tokens <= 2. When num_spec_tokens > 2, issues such as insufficient GPU memory and operator computation errors may occur. We will address this in subsequent updates.
We will add support for Eagle v1 in future updates.

Acceptance Test Script

SCRIPT="/offline/eagle.py"
DATASET="ShareGpt"
MODEL=Meta-Llama-3.1-8B-Instruct
DRAFT=EAGLE3-LLaMA3.1-Instruct-8B

CUDA_VISIBLE_DEVICES="0" VLLM_USE_V1=1 $PYTHON $SCRIPT \
    --dataset $DATASET \
    --num_spec_tokens 2 \
    --max_num_seqs 1 \
    --model_dir $MODEL \
    --eagle_dir $DRAFT \
    --tp 1 \
    --num_prompts 80

Acceptance Test Results

██████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [21:22<00:00, 16.03s/it, est. speed input: 4.72 toks/s, output: 13.56 toks/s]
-------------------------------------------------------------------------------------
mean acceptance length: 1.63
-------------------------------------------------------------------------------------
total_counts: 8062
acceptance at token 0: 1.00 (8062 times)
acceptance at token 1: 0.70 (5612 times)
acceptance at token 2: 0.47 (3765 times)

Closes: #1004

Yikun · 2025-05-30T07:25:17Z

Thansk for contributions, you can git commit --amend -s amend the singned off by info and git push origin vllm-eagle3 -f

Yikun · 2025-05-30T23:11:20Z

Thanks for your contributions.

Please add How was this patch tested? in commit msh including vllm server or script to help others reproduce locally
Please change code note to english
Add e2e test: https://github.com/vllm-project/vllm-ascend/tree/main/tests/long_term/spec_decode/e2e , you can reference: https://github.com/vllm-project/vllm/blob/main/tests/spec_decode/e2e/test_eagle_correctness.py
Fix mypy and lint CI, you can refernece: https://vllm-ascend.readthedocs.io/en/latest/developer_guide/contributing.html

github-actions · 2025-06-03T12:47:43Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-06-05T08:36:49Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

mengwei805 · 2025-06-06T01:12:34Z

vllm_ascend/spec_decode/eagle_v1.py

@@ -0,0 +1,429 @@
+# SPDX-License-Identifier: Apache-2.0


please revise file header.
I think this file should be placed in the woker directory instead of creating a new spec_decode directory. @wangxiyuan

mengwei805 · 2025-06-06T01:28:06Z

tests/long_term/spec_decode/e2e/test_eagle_correctness.py

+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,


make sure this UT could run in v1 engine when CI running.
The UT in the current long_term directory is run by the v0 engine by default.
You can refer to test_v1_spec_decode.py or test_v1_mtp_correctness.py.
You may also have to modify .github/workflows/vllm_ascend_test_long_term.yaml

github-actions · 2025-06-06T01:31:49Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan · 2025-06-19T03:17:57Z

@jianzs @ganyi1996ppo @Yikun please help review.

github-actions · 2025-06-19T17:03:54Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: yuancaoyaoHW <[email protected]>

Yikun · 2025-06-20T04:00:20Z

I'm OK with this PR, I trend to merge this today.

MengqingCao · 2025-06-20T07:11:35Z

Please add egaer_mode=True to https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/long_term/test_deepseek_v2_lite_tp2_accuracy.py#L41

Signed-off-by: yuancaoyaoHW <[email protected]>

Yikun requested a review from mengwei805 May 30, 2025 07:28

yuancaoyaoHW changed the title ~~Basie eagle3 proposer~~ Basie eagle3 proposer for v1 May 30, 2025

github-actions bot added the merge-conflicts label Jun 3, 2025

yuancaoyaoHW force-pushed the vllm-eagle3 branch 4 times, most recently from edf2424 to 1c6ed70 Compare June 4, 2025 01:26

github-actions bot added the module:tests label Jun 4, 2025

wangxiyuan mentioned this pull request Jun 4, 2025

[release] 0.9.0rc1 release checklist #904

Open

76 tasks

yuancaoyaoHW changed the title ~~Basie eagle3 proposer for v1~~ Support eagle3 proposer for v1 Jun 4, 2025

github-actions bot added merge-conflicts and removed merge-conflicts labels Jun 4, 2025

yuancaoyaoHW changed the title ~~Support eagle3 proposer for v1~~ [V1][eagle3]Support eagle3 proposer for v1 Jun 5, 2025

yuancaoyaoHW force-pushed the vllm-eagle3 branch 2 times, most recently from df84c06 to 9a4371e Compare June 5, 2025 09:48

github-actions bot removed the merge-conflicts label Jun 5, 2025

yuancaoyaoHW force-pushed the vllm-eagle3 branch from 9a4371e to f46cc42 Compare June 5, 2025 10:57

mengwei805 reviewed Jun 6, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Jun 6, 2025

yuancaoyaoHW force-pushed the vllm-eagle3 branch from f46cc42 to 95e90cd Compare June 6, 2025 02:47

github-actions bot removed the merge-conflicts label Jun 6, 2025

yuancaoyaoHW force-pushed the vllm-eagle3 branch 3 times, most recently from da8ae03 to 33e2db5 Compare June 6, 2025 06:48

github-actions bot removed the module:tests label Jun 6, 2025

yuancaoyaoHW force-pushed the vllm-eagle3 branch from b089e8d to 9b95136 Compare June 17, 2025 07:06

Yikun added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Jun 17, 2025

yuancaoyaoHW force-pushed the vllm-eagle3 branch from 9b95136 to 55fa385 Compare June 18, 2025 01:31

MengqingCao added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Jun 18, 2025

yuancaoyaoHW force-pushed the vllm-eagle3 branch from 55fa385 to d250d7d Compare June 18, 2025 06:10

Yikun added ready-for-test start test by label for PR ready read for review and removed ready-for-test start test by label for PR labels Jun 18, 2025

github-actions bot added merge-conflicts and removed ready read for review labels Jun 19, 2025

yuancaoyaoHW force-pushed the vllm-eagle3 branch from d250d7d to b4b74c9 Compare June 20, 2025 01:22

github-actions bot removed the merge-conflicts label Jun 20, 2025

[v1] Support eagle3 proposer.

d0aa341

Signed-off-by: yuancaoyaoHW <[email protected]>

yuancaoyaoHW force-pushed the vllm-eagle3 branch from b4b74c9 to d0aa341 Compare June 20, 2025 01:24

Yikun added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Jun 20, 2025

Yikun approved these changes Jun 20, 2025

View reviewed changes

wangxiyuan approved these changes Jun 20, 2025

View reviewed changes

Fix test_deepseek_v2_lite_tp2_accuracy.py

be0dcb2

Signed-off-by: yuancaoyaoHW <[email protected]>

yuancaoyaoHW force-pushed the vllm-eagle3 branch from ccd73f2 to be0dcb2 Compare June 20, 2025 07:17

Yikun mentioned this pull request Jun 20, 2025

[release] 0.9.1rc1 release checklist #1315

Closed

29 tasks

Yikun changed the title ~~[V1][eagle3]Support eagle3 proposer for v1~~ [V1][eagle3] Support eagle3 proposer for v1 Jun 20, 2025

wangxiyuan approved these changes Jun 20, 2025

View reviewed changes

wangxiyuan merged commit 00ae250 into vllm-project:main Jun 20, 2025
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[V1][eagle3] Support eagle3 proposer for v1 #1032

[V1][eagle3] Support eagle3 proposer for v1 #1032

Uh oh!

yuancaoyaoHW commented May 30, 2025 •

edited

Loading

Uh oh!

Yikun commented May 30, 2025 •

edited

Loading

Uh oh!

Yikun commented May 30, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 3, 2025

Uh oh!

github-actions bot commented Jun 5, 2025

Uh oh!

mengwei805 Jun 6, 2025

Uh oh!

mengwei805 Jun 6, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

wangxiyuan commented Jun 19, 2025

Uh oh!

github-actions bot commented Jun 19, 2025

Uh oh!

Yikun commented Jun 20, 2025

Uh oh!

MengqingCao commented Jun 20, 2025

Uh oh!

Uh oh!

Uh oh!

[V1][eagle3] Support eagle3 proposer for v1 #1032

[V1][eagle3] Support eagle3 proposer for v1 #1032

Uh oh!

Conversation

yuancaoyaoHW commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Acceptance Test Script

Acceptance Test Results

Uh oh!

Yikun commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yikun commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 3, 2025

Uh oh!

github-actions bot commented Jun 5, 2025

Uh oh!

mengwei805 Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

mengwei805 Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

wangxiyuan commented Jun 19, 2025

Uh oh!

github-actions bot commented Jun 19, 2025

Uh oh!

Yikun commented Jun 20, 2025

Uh oh!

MengqingCao commented Jun 20, 2025

Uh oh!

Uh oh!

Uh oh!

yuancaoyaoHW commented May 30, 2025 •

edited

Loading

Yikun commented May 30, 2025 •

edited

Loading

Yikun commented May 30, 2025 •

edited

Loading

mengwei805 Jun 6, 2025 •

edited

Loading