-
Notifications
You must be signed in to change notification settings - Fork 5
Description
The current specification for the run_loop::schedule()
operation seems to be to just wait until the worker thread dequeues the task and only then check to see if a stop-request was sent - calling set_stopped()
if there was a stop-request, otherwise calling set_value()
.
This could be problematic if trying to use run_loop
with the get_delegation_scheduler
feature and could lead to deadlocks.
For example, say someone schedules some work on another thread and wants to block waiting for that work to complete using sync_wait()
. This will inject a run_loop scheduler as the get_delegation_scheduler
.
The user, wanting to make use of the get_delegation_scheduler
, schedules work on a composite scheduler that tries to use a primary scheduler, but also schedules to the delegation scheduler to allow the work to run in either the current thread or on some other context. This way, if all other threads on the other context are busy/blocked then we can still make forward progress on the task using the current thread.
But this approach of scheduling a task on each scheduler and running on whichever completes first only really works if both of the schedulers support "synchronous cancellation". i.e. when a stop-request is sent then either it completes inline in the call to request_stop()
with set_stopped()
or it is guaranteed to eventually complete with set_value()
. i.e. some thread has already dequeued the task and is about to/already calling set_value()
.
This property allows whichever scheduler completed first to cancel the schedule operation on the other scheduler and then block waiting for the cancellation to finish before then continuing to signal completion.
However, the current specification of run_loop
does not have this behaviour and so there is no guarantee that if the other scheduler completed first that the cancelled run_loop-schedule operation will complete in a timely manner (or at all).
Activity
LeeHowes commentedon Oct 15, 2024
Do you have any thoughts on what the wording change needs to be? This whole design space is difficult to get right so the gap isn't especially surprising.
lewissbaker commentedon Oct 15, 2024
The remedy here would be to change the
run_loop::run-loop-opstate-base
to have aprev
pointer, making it a doubly linked list.Then have
run-loop-opstate<Rcvr>
have two specialisations. One default one which registers a stop-callback, and one constrained byunstoppable_token<stop_token_of_t<env_of_t<Rcvr>>
which does not register a stop-callback.The one with the stop-callback would call
set_stopped()
inline in the stop-callback if it successfully removed the item from the queue before the worker thread removed it from the queue.LeeHowes commentedon Oct 23, 2024
We could remove this for now, or make sync_wait do nothing but call start() and wait for completion (inline or not).
The actual delegating version could be added later:
sync_wait_with_provided_run_loop_context()
,tbb_sync_wait()
etc. Or one that embeds the run loop in a sender.It would remove functionality, in that it would be harder out of the box to implement concurrency on the current thread in the way that
sync_wait(task());
can work. It would at least be additive in library code or in C++29, without embedding something broken in C++26.Another baby solution would be to simply rename
sync_wait
to be more explicit about how it behaves. It would be less surprising.lewissbaker commentedon Oct 23, 2024
One concern was that having
sync_wait()
bake in use ofrun_loop
prevents it from later being extended to support things like time_schedulers, or I/O scheduling.We discussed several options in the meeting.
We discussed removing
sync_wait
until we have fully fleshed out the forward-progress-delegation facilities/design, but this would leave users without an out-of-the-box way to start asynchronous work without writing their own equivalent tosync_wait()
.One option was to make
sync_wait(sender auto&& s)
just connect/start and then block until the operation completes, without running any event-loop internally. This would be simpler and lighter-weight and less-likely to be outdated quickly.Then we could add a subsequent
drivable_context
concept that provides access to arun(stoppable_token auto st)
member function to drive the execution of the event-loop until a stop-request is sent, then provide async_wait(sender auto&& s, drivable_context auto& ctx)
that would then connect/start the sender and then callctx.drive(st)
until the operation completed, which would then trigger a stop-request onst
that would causectx.drive()
to exit.This way, users can provide their own execution context to drive when calling
sync_wait()
.We also discussed the possibility of having an
async_drivable_context
that has anasync_drive()
method that returns a sender, which would use the parent scheduler injected in the receiver's environment to schedule driving that context on the parent context. e.g. enqueueing a task to the parent context whenever there is a non-empty queue in the child context. This would allow, e.g. multiple contexts to all be driven from a single synchronously-driven context.We also discussed another option similar to
async_drivable_context
which was to represent thedrive()
function as a sender and have it block instart()
until it received a stop-request. This could be viewed as a degenerate case of the aboveasync_drivable_context
but in the case where there is no parent context which can be delegated to, so it must drive the context synchronously insidestart()
.Concerns were raised about this approach with regards to its safety/composability. e.g. you would have to describe the work using
sync_wait(when_any(work, ctx.async_drive()))
and ensure that the work was started before thectx.async_drive()
as nothing else will start once the async_drive operation entersstart()
.Trying to run multiple event loops this way using
when_all(ctx1.async_drive(), ctx2.async_drive())
would never get to drivingctx2
.Ideally a paper needs to explore this more - in particular the drivable_context concept and also forward-progress-delegation in general. Nobody had capacity to look into this at the moment, however.
BenFrantzDale commentedon Oct 24, 2024
@lewissbaker what about renaming the current
sync_wait
tosync_wait_run_loop
or some similar ugly non-ideal name, leavingsync_wait
available for future use?Also, related to this, you've mentioned the idea of a synchronously-cancelable scheduler. Do you have an API in mind? I'm picturing a scheduler that produces a move-only sender that has a
.handle().sync_cancel() -> bool
and.handle().sync_cancel_requested() -> bool
member functions, so you could doI think that would let us do something similar to
when_any
where we try to schedule on two sync-cancellable schedulers, but as they complete, atomically count.Not sure if that's correct. I suspect there's a race in there but that it's fixable. :-D
lewissbaker commentedon Oct 24, 2024
I was just thinking about having a query on a sender (in this case the schedule-sender) that lets you ask if a stop-request is synchronously-cancellable. The actual stop-request would still be sent in the same way (via stop-tokens) but the sender is guaranteeing that when a stop-request is sent to it that it is guaranteed to make forward progress and call the completion-handler (either set_stopped on the thread sending the stop-request) or set_value/set_error on some other thread, in the case that the operation has already completed and is being processed.
BenFrantzDale commentedon Oct 25, 2024
I see. I really like reusing the existing cancelation API like that. So the way to detect if sync-canceling worked is to ask it to cancel and then see if the receiver got a
set_stopped()
? Or for the work-stealing use case, the winning thread cancels the other thread, which either works (and callsset_stopped()
which decrements a count) or doesn’t (in which case the first thread can let the other thread take over. Something like that, right?