Skip to content

Commit 8b14fda

Browse files
WoosukKwondbyoung18
authored andcommitted
[BugFix][Spec Decode] No in-place update to draft probs (vllm-project#16952)
Signed-off-by: Woosuk Kwon <[email protected]>
1 parent 7416f68 commit 8b14fda

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

vllm/v1/spec_decode/eagle.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -264,7 +264,9 @@ def compute_probs_and_sample_next_token(
264264
# TODO(woosuk): Consider seeds.
265265
q = torch.empty_like(probs)
266266
q.exponential_()
267-
next_token_ids = probs.div_(q).argmax(dim=-1).view(-1)
267+
# NOTE(woosuk): We shouldn't use `probs.div_(q)` because the draft_probs
268+
# will be used later for rejection sampling.
269+
next_token_ids = probs.div(q).argmax(dim=-1).view(-1)
268270
if not sampling_metadata.all_random:
269271
greedy_token_ids = probs.argmax(dim=-1)
270272
next_token_ids = torch.where(

0 commit comments

Comments
 (0)