You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@unittest.skip("Skip this test because this feature has a bug. See comments below.")
450
448
deftest_stateful_custom_logit_processor(self):
451
449
"""Test custom logit processor with a single request."""
450
+
451
+
"""
452
+
NOTE: This feature has a race condition bug.
453
+
This line https://github.com/sgl-project/sglang/blob/ef8ec07b2ce4c70c2a33ec5acda4ce529bc3cda4/test/srt/test_srt_endpoint.py#L395-L396 can be accessed by two concurrent threads at the same time. The access order is not guaranteed.
454
+
In sglang, we use two python threads to overlap the GPU computation and CPU scheduling.
455
+
Thread 1 (the CPU scheduling thread) will update the `param_dict["__req__"].output_ids`.
456
+
Thread 2 (the GPU computation thread) will call `DeterministicStatefulLogitProcessor` because sampling is considered as GPU computation.
457
+
We can fix this by moving the call of DeterministicStatefulLogitProcessor to the CPU scheduling thread.
0 commit comments