Skip to content

std::sync::Once can block forever in forked process #43448

Closed
@joshlf

Description

@joshlf
Contributor

There's a bug in std::sync::Once that makes it so that, under certain conditions, a call to call_once in a process which was forked from another Rust process can block forever.

How Once works

Simplifying a bit (ignoring poisoning), the algorithm employed by Once works as follows: The Once can be in one of three states: INCOMPLETE, COMPLETE, and RUNNING.

  • The Once starts off in the INCOMPLETE state.
  • When a call to call_once begins, the Once might be in any of the three states:
    • If the Once is in the INCOMPLETE state, then it is transitioned to the RUNNING state, and the function begins executing.
    • If the Once is in the RUNNING state, then some other call to call_once is executing the function, so this call puts itself on a list of waiters and goes to sleep. It will be woken back up once the function is done executing in whatever thread is executing it.
    • If the Once is in the COMPLETE state, then the function has already been executed, so call_once returns immediately without doing anything.

Finally, when the function's execution completes, the thread doing the execution transitions the Once into the COMPLETE state, and wakes up any waiters that accumulated while it was executing the function.

The issue

This algorithm is broken when forking. In particular, if a Once is in the RUNNING state at the point that the process forks, when the child's memory space (which, by default, is a copy-on-write copy of the parent's) is created, the Once will still be in the RUNNING state. However, in the child process, calls to call_once will fail for two reasons:

  • If the call happens while the function is still being executed, the waiter object that is enqueued will not actually be visible to the executor because it will only affect the child's memory space, not the parent's, and so the executor (a member of the parent thread) finishes, it will wake up all of the waiters in the parent process, blissfully ignorant that a thread from the child process is also waiting.
  • If the function execution finishes first, the change of the Once's state from RUNNING to COMPLETE will not be reflected in the child's memory space. Thus, a future call to call_once will spuriously find the Once still in the RUNNING state even though it isn't really in that state anymore.

These two problems can be seen in action in two proofs of concept that I wrote: This one demonstrates the first issue, while this one demonstrates the second.

A proposed fix

Joint credit for this proposal goes to @ezrosent.

The idea behind this fix is to record the process' PID when transitioning a Once from INCOMPLETE to RUNNING, and having future accesses that find the Once in the RUNNING state verify that it wasn't transitioned by a parent process. Unfortunately, this doesn't quite work because PIDs can be re-used, so if process A spawns process B, then process A quits, then process B spawns process C, it's possible for A and C to have the same PID.

Instead, we introduce the idea of an "MPID" - a monotonically-increasing PID-like counter that is maintained by the process (e.g. . We increment it every time a process forks, and use it in the Once objects to record which process transitioned an object from INCOMPLETE to RUNNING.

More concretely, here are the components of the proposed solution:

  • There is a process-global MPID variable (could be either usize or u64) that is initialized to 0 and is incremented immediately after fork. Note that this does not guarantee that no two processes anywhere in the tree of processes forked from a particular process have the same MPID. In fact, all processes forked by a given process will all have the same MPID. However, it does guarantee that a process will not share an MPID with any of its ancestors, and that is all we need.
  • The Once object is modified to have another mpid field that is initialized to 0.
  • Each waiter object is modified to have another mpid field that is initialized to the MPID of the current process when the object is created.
  • A modified algorithm for call_once looks roughly like this:
    • Loop:
      • Load the current state. If it is COMPLETE, return.
      • If the state is INCOMPLETE, do then load mpid and:
        • If mpid is equal to the current MPID, then try to CAS the state from INCOMPLETE to RUNNING. If it fails, retry the entire loop. If it succeeds, you're responsible for running the function, so do the original algorithm.
        • If mpid is not equal to the current MPID, then try to CAS it from its current value to the current MPID. If this succeeds, go to the previous step (where mpid is equal to the current MPID), and if it fails, retry the entire loop.
      • If the state is RUNNING, then load mpid and:
        • If mpid is equal to the current MPID, then the thread that transitioned the Once into the RUNNING state is in the current process, so do the normal algorithm: wait for it to be done (recording the current MPID in the waiter object).
        • If mpid is not equal to the current MPID, then the thread that transitioned the Once into the RUNNING state is in an ancestor process. Thus, attempt to CAS mpid to the current MPID. If it fails, repeat the entire loop. If it succeeds, then it is your responsibility to run the function, so continue as if you had transitioned the Once into the RUNNING state, with one exception: when waking up waiters, you need to check that they are not waiters in an ancestor process; do this by checking the waiter object's mpid field, and only waking waiters with an mpid field equal to the current MPID.

One thing to note: It is safe to try to CAS mpid and then separately to transition into RUNNING even though the value of mpid needs to reflect the MPID of the thread that transitioned into RUNNING - a thread that successfully transitions a Once into the RUNNING state will have previously verified that mpid is correct, and thus it will not change forever in the future (at least, not in this process) since the MPID of a process never changes.

Open question

One open question is how to ensure that code is run just after fork (to increment the global MPID variable) and, critically, before any other code runs (especially code that uses Once). pthread provides the pthread_atfork function to register callbacks that run before and after fork calls, but obviously this doesn't address Windows, and I also don't know if there's a good way to ensure that the necessary call to pthread_atfork is made at process initialization time.

Prior art

There's some prior art here. In particular, jemalloc has acknowledged a similar issue, and has a partial fix for it.

Activity

sfackler

sfackler commented on Jul 24, 2017

@sfackler
Member

How does pthread_once_t deal with this issue? We've been thinking a bit about switching to the system primitives instead of rolling our own.

sfackler

sfackler commented on Jul 24, 2017

@sfackler
Member

More generally though, trying to continue on to do basically anything after a multithreaded fork is a dicy proposition. Every other thread disappearing in the middle of whatever it was doing does not produce not a particularly safe environment.

joshlf

joshlf commented on Jul 24, 2017

@joshlf
ContributorAuthor

OK, I implemented a prototype fix - it's available here (I copied the once module wholesale into the repo and did the fix there). It needs to be cleaned up and commented, but I've verified that it works.

@sfackler I'll look into the pthread_once_t question.

joshlf

joshlf commented on Jul 24, 2017

@joshlf
ContributorAuthor

@ezrosent found a StackOverflow post whose answers suggest that the solution is just to give up.

This commit message implies that pthread_once_t doesn't handle the issue at all, and in fact that using a pthread_once_t like that is incorrect: "POSIX standard says if a multi-threaded process calls fork(), the new process may only execute async-signal-safe operations until exec functions are called."

joshlf

joshlf commented on Jul 24, 2017

@joshlf
ContributorAuthor

I think it would be a real shame to decide that Rust is going to be similarly hamstrung. "No async-signal-unsafe code before exec" is a pretty draconian requirement for, e.g., programs that rely on fork w/o exec to inherit certain state to continue operating on.

I will note, however, that if we go the route of trying to support this behavior, we'll have to be really careful that we document the behavior well. The behavior we want is probably not the "obvious" behavior - e.g., having a Mutex magically unlock itself is not obvious, even if it's probably what you want.

ezrosent

ezrosent commented on Jul 24, 2017

@ezrosent

It's unclear to me what the path forward is overall with this. The SO post linked above indicates that there may not be a standards-compliant way to "fix" mutexes that were locked in the parent during a fork call, but there are good reasons for Mutex to call out to pthread_mutex, etc.

However, this issue is more circumscribed: do we want sync::Once to be safe for programs that have non-exec forks? The jemalloc example above is instructive here, because there are plenty of single-threaded programs that rely on malloc and also call fork. @joshlf and I are working on some allocator material in a similar vein, including global allocators which rely on sync::Once via lazy_static, this is partly why we think it's worthwhile not to punt on this.

Stebalien

Stebalien commented on Jul 24, 2017

@Stebalien
Contributor

Related: #39575

sfackler

sfackler commented on Jul 24, 2017

@sfackler
Member

Having a Mutex automatically unlock it self is not what you want - that would make it practically impossible to write correct unsafe abstractions. If a thread has a Mutex locked as it's half way through adjusting a bunch of pointers and then suddenly disappears, there is no real way to recover from that. You can't write code that properly checkpoints at every instruction.

Notably, malloc is an example of a thing like that. Deadlock is preferable to hoping you don't run off the end of a busted linked list post fork.

The same goes for "fixing" the state of an in-progress Once - the guarantee that primitive provides is that the function is run exactly once. In particular that means you can assume that your starting state is where you expect, not at some arbitrary point through initialization.

joshlf

joshlf commented on Jul 24, 2017

@joshlf
ContributorAuthor

Ah, that's a very good point, and a general one at that (applies to pretty much all non-atomic synchronization).

Alternative proposal then: Add an extra variant of being poisoned. Using essentially the same mechanism, we can detect whether the holder of a mutex/person executing a once/etc is a thread in our process or an ancestor process, and if it's the latter, return that the synchronization object has been poisoned. It could be the same poison value (probably the most backwards-compatible, especially since most of these data structures are stable) or a distinct poison value so callers can distinguish between "poisoned by a panic" and "poisoned by forking."

joshlf

joshlf commented on Jul 24, 2017

@joshlf
ContributorAuthor

@Stebalien good call; thanks!

sfackler

sfackler commented on Jul 25, 2017

@sfackler
Member

What would be done in response to this kind of poisoning?

joshlf

joshlf commented on Jul 25, 2017

@joshlf
ContributorAuthor

Well, I figure the two kinds of poisoning are pretty similar, actually, in the state that they leave things in. In both cases, all you know is that some part of an operation (in the case of a Once, the function being executed, and in the case of a mutex, the critical section) has been executed, but that it may not have completed, and thus whatever state it was mutating is possibly in some (probably invalid) in between state. In either case, a particular application may know enough about the operation that was being performed in order to be able to clean up and get the state back to a known good state so that the operation can be retried (or to pick up where the old code left off).

The one distinction is that it's technically possible for an application to know precisely where a function/critical section might panic, while a fork in thread A could case the memory being operated on by thread B to be snapshotted at virtually any point in the execution of the function/critical section in B. Thus, in theory, there could be cases in which it'd be possible to recover from a panic-induced poison but not from a fork-induced poison. However, I suspect those cases are exceedingly rare, and in any case, trying to reason about exactly where your code can panic (and to make assumptions about the state of the world on that basis) is very very sketchy, and probably not advised.

joshlf

joshlf commented on Jul 25, 2017

@joshlf
ContributorAuthor

Prototype implementation of the poisoning approach in the poison branch.

added
C-bugCategory: This is a bug.
T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.
on Jul 26, 2017

8 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-concurrencyArea: ConcurrencyC-bugCategory: This is a bug.T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @steveklabnik@Stebalien@joshlf@sfackler@jonas-schievink

        Issue actions

          std::sync::Once can block forever in forked process · Issue #43448 · rust-lang/rust