-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Description
NOTE
Problem described below is fixed in 1.79.0
. Another version of it happens now. Skip to #124920 (comment)
System:
OS: Fedora 40
Arch: x86_64
Toolchain: https://sh.rustup.rs
v1.78.0
Disclaimer
The following bug is really really weird, and I struggle to make a minimal working example.
Unfortunately, I don't want to share the code it appears in publicly yet. But will invite everyone who wants to fix it, to the repo.
Description
When compiling something like the following code, with the toolchain installed via curl sh.rustup.rs
, thread::scope
aborts with The futex facility returned an unexpected error code.
.
// Everything before this function is strictly sequential.
// This is the first place any thread spawning happens.
fn do_something(&mut self, num_threads: usize, wmap: &RwLock<SomeStruct>) {
thread::scope(|s| {
// It doesn't matter what is in here. With and without code it fails.
});
// Deleting the following line will make the abort go away.
self.a_trait_fcn(wmap);
}
// Note that "self" holds pretty deeply nested structures which contain HashMaps with RwLocks.
The strace
shows that the thread is trying to attach itself to a futex, it is not allowed to attach to (according to the man pages):
futex(0x7afe2b3a1a08, FUTEX_LOCK_PI, NULL) = -1 EPERM (Operation not permitted)
Now to the funny part. This error only happens with the toolchain obtained from sh.rustup.rs
.
I built the same version locally with and without debug symbols and the error goes away. I assume this happens due to different optimizations done by my locally built toolchains and the rustup
one?
Also, with my locally built toolchains futex
is not called at all (in the function where the abort happens).
Additional clues
- The error did not happen on Debian 11 before. It only occurred after switching to a Fedora 40 VM.
- The code is compiled to a library and loaded by C code.
- Related issue: The futex facility returned an unexpected error code. #93228
Logs
Full strace
:
futex(0x7afe2b3a1a08, FUTEX_LOCK_PI, NULL) = -1 EPERM (Operation not permitted)
writev(2, [{iov_base="The futex facility returned an u"..., iov_len=54}], 1The futex facility returned an unexpected error code.
) = 54
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7afe2b20e000
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
gettid() = 217718
getpid() = 217718
tgkill(217718, 217718, SIGABRT) = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=217718, si_uid=1000} ---
+++ killed by SIGABRT (core dumped) +++
Aborted (core dumped)
Valgrind stacktrace
The futex facility returned an unexpected error code.
==230409==
==230409== Process terminating with default action of signal 6 (SIGABRT): dumping core
==230409== at 0x4B56144: __pthread_kill_implementation (in /usr/lib64/libc.so.6)
==230409== by 0x4AFE65D: raise (in /usr/lib64/libc.so.6)
==230409== by 0x4AE6901: abort (in /usr/lib64/libc.so.6)
==230409== by 0x4AE7766: __libc_message_impl.cold (in /usr/lib64/libc.so.6)
==230409== by 0x4B49508: __libc_fatal (in /usr/lib64/libc.so.6)
==230409== by 0x4B50A05: __futex_lock_pi64 (in /usr/lib64/libc.so.6)
==230409== by 0x4B57207: __pthread_mutex_lock_full (in /usr/lib64/libc.so.6)
==230409== by 0x4B00779: __cxa_thread_atexit_impl (in /usr/lib64/libc.so.6)
==230409== by 0x803826B: register_dtor<std::sys_common::thread_info::ThreadInfo> (fast_local.rs:161)
==230409== by 0x803826B: __getit (fast_local.rs:56)
==230409== by 0x803826B: try_with<std::sys_common::thread_info::ThreadInfo, std::sys_common::thread_info::{impl#0}::with::{closure_env#0}<std::thread::Thread, std::sys_common::thread_info::current_thread::{closure_env#0}>, std::thread::Thread> (local.rs:283)
==230409== by 0x803826B: with<std::thread::Thread, std::sys_common::thread_info::current_thread::{closure_env#0}> (thread_info.rs:24)
==230409== by 0x803826B: std::sys_common::thread_info::current_thread (thread_info.rs:34)
==230409== by 0x8032F05: std::thread::current (mod.rs:708)
==230409== by 0x7F95E78: std::thread::scoped::scope (scoped.rs:138)
==230409== by 0x7F61D18: do_something (icfg.rs:124)
==230409==
==230409== HEAP SUMMARY:
==230409== in use at exit: 12,322,024 bytes in 80,601 blocks
==230409== total heap usage: 7,507,924 allocs, 7,427,323 frees, 873,666,132 bytes allocated
==230409==
==230409== LEAK SUMMARY:
==230409== definitely lost: 6,172 bytes in 521 blocks
==230409== indirectly lost: 0 bytes in 0 blocks
==230409== possibly lost: 7,282,094 bytes in 20,289 blocks
==230409== still reachable: 5,033,630 bytes in 59,790 blocks
==230409== suppressed: 128 bytes in 1 blocks
==230409== Rerun with --leak-check=full to see details of leaked memory
==230409==
==230409== Use --track-origins=yes to see where uninitialised values come from
==230409== For lists of detected and suppressed errors, rerun with: -s
==230409== ERROR SUMMARY: 192 errors from 96 contexts (suppressed: 0 from 0)
Aborted (core dumped)
Activity
the8472 commentedon May 9, 2024
Rust built by CI links against an old glibc version for backwards compatibility. Maybe symbol versioning makes a difference? Having
strace
print stacktraces for each syscall might shed some light if different paths are taken.Rot127 commentedon May 9, 2024
Local libc version:
They are indeed very different for the
scope()
function. But it doesn't seem to be related to libc version:CI toolchain with abort:
With the locally built toolchain it never reaches
__futex_lock_pi64
. The next syscall executed is from within thescope()
closure.the8472 commentedon May 9, 2024
Your from-source toolchain is also 1.78? There were some recent changes around thread locals and thread parking on master.
Rot127 commentedon May 9, 2024
Yes:
But let me try with latest
master
and see if the stack trace changes again.Rot127 commentedon May 10, 2024
Same result as above.
__futex_lock_pi64
is never called.Rot127 commentedon Jul 5, 2024
Building with the self build toolchain as described above now also gives me the error.
But only, if there are certain
println!()
. It is really really weird.But building with the
stable
toolchain fromrustup
works in these cases.Rot127 commentedon Jul 5, 2024
After removing some
println!
and fields of some structs, I now get with therustup
toolchain:The program runs with two threads (one main, one scoped spawned).
Self built works fine again.
Rot127 commentedon Jul 15, 2024
With
1.78.0
I now have even the problem, thatscoped()
is not even entered.prints only
Before scoped
scoped and freezes.Everything works fine with toolchain
<=1.76.0
.1.77.0/1.78.0
freeze.There are only three lines of actual code difference.
git diff 1.76.0..1.77.0 library/std/src/thread/
Rot127 commentedon Jul 15, 2024
Ok, seems to work fine in
1.79.0
. Will keep this open for a while and test. But it can be closed probably.Rot127 commentedon Jul 18, 2024
Still a problem in
1.79.0
. It seems to be related to formatted strings?assert
inlibc
:Fatal glibc error: tpp.c:83 (__pthread_tpp_change_priority): assertion failed: new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)
Triggered by having a formatted
debug_assert!
:some_fcn
is called by another function, within the local thread.Removing any of the arguments will make the code work. Two of the arguments are clones or references of members of the write-locked struct.
The last one from outside the loop.
I can also split up the debug in two
print!
and adebug_assert!(false)
and it doesn't crash. Except if an argument inprintln!
has a format specifier like{:#x}
. So it seems to be connected to the number of arguments passed and their formatting.Crashes:
Using simply
0,1,2
for the arguments make the code work again.I would debug it, but really don't have time, unfortunately.
Stack-trace:
[-]`thread::scope` aborts with `futex()` `EPERM` unexpected error code.[/-][+]Reachable `libc` assert by `thread::scope`, when printing `RwLock` protected values as formatted strings.[/+]Rot127 commentedon Sep 25, 2024
So the assert can also be reached by printing
std::thread::current().id()
.Still can't come up with a minimal working example. But will continue to try.