fix: avoid KeyError when cancelling requests that have not been processed #233

tjohnson31415 · 2025-06-12T21:26:10Z

github-actions · 2025-06-12T21:26:17Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

ckadner · 2025-06-12T21:39:40Z

vllm_spyre/v1/worker/spyre_model_runner.py

@@ -603,15 +603,15 @@ def _update_states(self, scheduler_output):

        # Continuous batching stuff
        for req_id in scheduler_output.finished_req_ids:
-            if req_id in self.req_ids2blocks:
+            # requests may be cancelled from the client side while in the queue
+            if req_id in self.requests:


does this not still need the test if req_id in self.req_ids2blocks to not potentially cause a missing key error below (for freed_block in self.req_ids2blocks[req_id])?

Yeah, thats a good question.
If we look at this code in isolation, each del on a map should be checked for presence of they key before the removal (eg. req_ids2left_pads should be checked too). But the request is added to all maps during the course of _update_states:

vllm-spyre/vllm_spyre/v1/worker/spyre_model_runner.py

Lines 650 to 688 in 97d03d6

self.req_ids2left_pads[

request_data.req_id] = self.tkv - len(prompt_tokens)

input_token_list.append(

torch.tensor(prompt_tokens,

dtype=torch.long,

device=torch.device("cpu")))

# filling block table and slot mapping

block_table_i = []

slot_mapping_i = []

for pos_i in range(block_padding):

if pos_i % self.BLOCK_SIZE == 0:

block_number = self.free_blocks.popleft()

block_table_i.append(block_number)

block_offset = pos_i % self.BLOCK_SIZE

slot = block_number * self.BLOCK_SIZE + block_offset

slot_mapping_i.append(slot)

self.req_ids2blocks[request_data.req_id] = deque(block_table_i)

slot_mapping.append(slot_mapping_i)

# Add new requests to the cached states.

req_id = request_data.req_id

sampling_params = request_data.sampling_params

if sampling_params.sampling_type == SamplingType.RANDOM_SEED:

generator = torch.Generator(device=self.device)

generator.manual_seed(sampling_params.seed)

else:

generator = None

req_state = CachedRequestState(

req_id=req_id,

prompt_token_ids=request_data.prompt_token_ids,

sampling_params=sampling_params,

generator=generator,

output_token_ids=[],

)

self.requests[req_id] = req_state

self.input_batch.add_request(req_state)

self.prefill_batch.add_request(req_state)

So the idea is that if it is in self.requests it is also in the other maps. But probably better to be safe and not make that assumption.

Yeah, I'm also trying to understand the problem here. Could it be that there's a race condition between 2 threads and one thread actually deletes the request from self.requests while the other is trying to do the same but fails? In that case then Christian's comment above does make sense. Or maybe we can also think about using thread-safe mechanisms here?

There shouldn't be any threading in vllm though, as far as I know. The only concurrency should be with async or multiprocessing 😕

ckadner

Elegant!

joerunde

Since we've run into problems with request cancellation twice, I think we should have a test for it here. We can follow this pattern that's used a couple places in vllm to flex similar behavior with aborting requests in the engine: https://github.com/vllm-project/vllm/blob/e6aab5de2999187c6cf0206f2d63ab6d7a0b6964/tests/v1/engine/test_async_llm.py#L147

Shouldn't be too hard to whip up a similar async test with an AsyncLLM so that this is covered in the future when we keep making changes to the model runner

tjohnson31415 · 2025-06-16T21:28:51Z

I think we will have to wait for #162 to be merged in to get the tests to pass (they pass in my dev env if I include that PR).

ckadner · 2025-06-16T23:07:28Z

I think we will have to wait for #162 to be merged in to get the tests to pass (they pass in my dev env if I include that PR).

Is it the "... and not cb" pytest marker in the .github/workflows/test.yml job matrix that will make your new tests work (ignored/skipped) for V1-e2e?

yannicks1

lgtm, once test pass this is ready to be merged

yannicks1 · 2025-06-17T07:37:59Z

tests/e2e/test_spyre_async_llm.py

+        if cancel_after is not None and count >= cancel_after:
+            return count, request_id
+
+        await asyncio.sleep(0.0)


is await asyncio.sleep(0.0) enough or do we need await asyncio.sleep(x>0). background: arguments 0 and x>0 might not behave the same (see here)

That is interesting that the behavior is different! Thanks for the link.

The tests seem fine with 0.0, but I'll add a small value just in case.

yannicks1 · 2025-06-17T07:47:22Z

vllm_spyre/v1/worker/spyre_model_runner.py

-                del self.req_ids2left_pads[req_id]
-
-            del self.requests[req_id]
+            logger.debug("Finishing request id: %s", req_id)


Suggested change

logger.debug("Finishing request id: %s", req_id)

if req_id in self.req_ids2blocks:

logger.debug("Freeing request id: %s", req_id)

This debug statement was for specific for CB to have confirmation that the blocks were actually freed. I would suggest to keep it that way. IMO it should not be a general statement that a request has finished, this could probably (or is already) at some other place in the code, not in the CB specific part.

vllm_spyre/v1/worker/spyre_model_runner.py

tests/e2e/test_spyre_async_llm.py

joerunde

removing my block since there are tests now!

…ssed Signed-off-by: Travis Johnson <[email protected]>

Signed-off-by: Travis Johnson <[email protected]>

joerunde · 2025-06-24T20:04:17Z

tests/e2e/test_spyre_async_llm.py

+    """Test handling of cancelled requests"""
+
+    if cb == 1 and backend != "eager":
+        pytest.skip("CB requires eager")


we can come back and fix this up to work on spyre

tjohnson31415 requested review from yannicks1, tdoublep, nikolaospapandreou and sducouedic as code owners June 12, 2025 21:26

ckadner reviewed Jun 12, 2025

View reviewed changes

ckadner approved these changes Jun 12, 2025

View reviewed changes

joerunde requested changes Jun 13, 2025

View reviewed changes

tjohnson31415 force-pushed the fix-cb-cancellation branch from 9567935 to d278d6a Compare June 16, 2025 17:55

tjohnson31415 requested a review from rafvasq as a code owner June 16, 2025 17:55

tjohnson31415 force-pushed the fix-cb-cancellation branch from b4ac122 to ad1b99a Compare June 16, 2025 20:32

yannicks1 approved these changes Jun 17, 2025

View reviewed changes

joerunde reviewed Jun 19, 2025

View reviewed changes

tests/e2e/test_spyre_async_llm.py Outdated Show resolved Hide resolved

joerunde reviewed Jun 19, 2025

View reviewed changes

tests/e2e/test_spyre_async_llm.py Outdated Show resolved Hide resolved

joerunde approved these changes Jun 19, 2025

View reviewed changes

tjohnson31415 added 6 commits June 23, 2025 13:10

fix: avoid KeyError when cancelling requests that have not been proce…

0a75961

…ssed Signed-off-by: Travis Johnson <[email protected]>

use .pop instead of del to remove from dict

2fb6493

Signed-off-by: Travis Johnson <[email protected]>

test: add async llm test for cancellation

eb597c6

Signed-off-by: Travis Johnson <[email protected]>

fix: support V0 has_unfinished_requests

452d260

Signed-off-by: Travis Johnson <[email protected]>

add asyncio_default_fixture_loop_scope config to remove warning

32fa778

Signed-off-by: Travis Johnson <[email protected]>

update vllm_version mark

99d8093

Signed-off-by: Travis Johnson <[email protected]>

tjohnson31415 force-pushed the fix-cb-cancellation branch from ad1b99a to a8cf848 Compare June 23, 2025 19:30

tjohnson31415 added 3 commits June 23, 2025 14:30

debug logs when actually freeing blocks

efdce92

Signed-off-by: Travis Johnson <[email protected]>

remove v0 mark from test_abort tests

51d38ae

Signed-off-by: Travis Johnson <[email protected]>

use a very small non-zero sleep

5566ed2

Signed-off-by: Travis Johnson <[email protected]>

tjohnson31415 force-pushed the fix-cb-cancellation branch from a8cf848 to 5566ed2 Compare June 23, 2025 20:30

remove unused args from test generate

46c87e6

Signed-off-by: Travis Johnson <[email protected]>

tjohnson31415 force-pushed the fix-cb-cancellation branch from 6ebcef6 to 46c87e6 Compare June 23, 2025 20:39

joerunde reviewed Jun 24, 2025

View reviewed changes

joerunde merged commit b47662b into main Jun 24, 2025
18 checks passed

joerunde deleted the fix-cb-cancellation branch June 24, 2025 20:12

	self.req_ids2left_pads[
	request_data.req_id] = self.tkv - len(prompt_tokens)
	input_token_list.append(
	torch.tensor(prompt_tokens,
	dtype=torch.long,
	device=torch.device("cpu")))

	# filling block table and slot mapping
	block_table_i = []
	slot_mapping_i = []
	for pos_i in range(block_padding):
	if pos_i % self.BLOCK_SIZE == 0:
	block_number = self.free_blocks.popleft()
	block_table_i.append(block_number)
	block_offset = pos_i % self.BLOCK_SIZE
	slot = block_number * self.BLOCK_SIZE + block_offset
	slot_mapping_i.append(slot)
	self.req_ids2blocks[request_data.req_id] = deque(block_table_i)
	slot_mapping.append(slot_mapping_i)

	# Add new requests to the cached states.
	req_id = request_data.req_id
	sampling_params = request_data.sampling_params
	if sampling_params.sampling_type == SamplingType.RANDOM_SEED:
	generator = torch.Generator(device=self.device)
	generator.manual_seed(sampling_params.seed)
	else:
	generator = None

	req_state = CachedRequestState(
	req_id=req_id,
	prompt_token_ids=request_data.prompt_token_ids,
	sampling_params=sampling_params,
	generator=generator,
	output_token_ids=[],
	)
	self.requests[req_id] = req_state
	self.input_batch.add_request(req_state)
	self.prefill_batch.add_request(req_state)

	logger.debug("Finishing request id: %s", req_id)
	if req_id in self.req_ids2blocks:
	logger.debug("Freeing request id: %s", req_id)

fix: avoid KeyError when cancelling requests that have not been processed #233

fix: avoid KeyError when cancelling requests that have not been processed #233

Conversation

tjohnson31415 commented Jun 12, 2025

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ckadner left a comment

Choose a reason for hiding this comment

Uh oh!

joerunde left a comment

Choose a reason for hiding this comment

Uh oh!

tjohnson31415 commented Jun 16, 2025

Uh oh!

ckadner commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yannicks1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joerunde left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ckadner commented Jun 16, 2025 •

edited

Loading