Description
When spawning a process in the case where posix_spawn
cannot be used, the spawn code uses fork
/execvp
. It acquires the environment lock before the fork, with a drop handler that unlocks it. Unfortunately, this is done in both the parent and child processes, and unlocking a lock acquired from another thread is undefined behavior (see pthread_rwlock_unlock
and pthread_mutex_unlock
).
An example of where this can happen is rustdoc running doctests. It has N threads all spawning rustc
processes. It was observed in #82221 that this was frequently causing deadlocks (i686, on a Docker image with glibc 2.23).
PR #82877 reverts the change from mutex to rwlock, but pthread_mutex_unlock
is also undefined behavior, we just fortunately have not run into any problems. This should be fixed. This can probably be done by mem::forget'ing the guard.
https://stackoverflow.com/questions/61976745/why-does-rust-rwlock-behave-unexpectedly-with-fork also provides some insight into why unlocking a rwlock after a fork doesn't work.
rustc 1.52.0-nightly (caca212 2021-03-05)
Activity
joshtriplett commentedon Mar 8, 2021
Based on some discussion on Zulip, I think the right approach would be to reinitialize the lock in the child, immediately after the fork.
We can't just forget the lock; that will break usage of the environment within a
pre_exec
function. Reinitializing the lock in the child should work.ghost commentedon Mar 8, 2021
Is this a duplicate of #64718?
(Also, as mentioned in #64718 (comment), it seems none of the
exec*p*
functions is signal-safe:)sfackler commentedon Mar 8, 2021
But
getenv
andsetenv
are not async-signal-safe, so isn't usage of the environment after a multi-threaded fork inherently broken?ehuss commentedon Mar 8, 2021
Indeed this is a duplicate of #64718, thanks for the link!
Closing, let's keep the discussion consolidated there.