Skip to content

Reachable libc assert by thread::scope, when printing RwLock protected values as formatted strings. #124920

@Rot127

Description

@Rot127

NOTE

Problem described below is fixed in 1.79.0. Another version of it happens now. Skip to #124920 (comment)


System:
OS: Fedora 40
Arch: x86_64
Toolchain: https://sh.rustup.rs v1.78.0

Disclaimer

The following bug is really really weird, and I struggle to make a minimal working example.
Unfortunately, I don't want to share the code it appears in publicly yet. But will invite everyone who wants to fix it, to the repo.

Description

When compiling something like the following code, with the toolchain installed via curl sh.rustup.rs, thread::scope aborts with The futex facility returned an unexpected error code..

// Everything before this function is strictly sequential.
// This is the first place any thread spawning happens.

fn do_something(&mut self, num_threads: usize, wmap: &RwLock<SomeStruct>) {
        thread::scope(|s| {
            // It doesn't matter what is in here. With and without code it fails.
        });
        // Deleting the following line will make the abort go away.
        self.a_trait_fcn(wmap);
}

// Note that "self" holds pretty deeply nested structures which contain HashMaps with RwLocks.

The strace shows that the thread is trying to attach itself to a futex, it is not allowed to attach to (according to the man pages):

futex(0x7afe2b3a1a08, FUTEX_LOCK_PI, NULL) = -1 EPERM (Operation not permitted)

Now to the funny part. This error only happens with the toolchain obtained from sh.rustup.rs.
I built the same version locally with and without debug symbols and the error goes away. I assume this happens due to different optimizations done by my locally built toolchains and the rustup one?

Also, with my locally built toolchains futex is not called at all (in the function where the abort happens).

Additional clues

Logs

Full strace:

futex(0x7afe2b3a1a08, FUTEX_LOCK_PI, NULL) = -1 EPERM (Operation not permitted)
writev(2, [{iov_base="The futex facility returned an u"..., iov_len=54}], 1The futex facility returned an unexpected error code.
) = 54
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7afe2b20e000
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
gettid()                                = 217718
getpid()                                = 217718
tgkill(217718, 217718, SIGABRT)         = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=217718, si_uid=1000} ---
+++ killed by SIGABRT (core dumped) +++
Aborted (core dumped)

Valgrind stacktrace

The futex facility returned an unexpected error code.
==230409== 
==230409== Process terminating with default action of signal 6 (SIGABRT): dumping core
==230409==    at 0x4B56144: __pthread_kill_implementation (in /usr/lib64/libc.so.6)
==230409==    by 0x4AFE65D: raise (in /usr/lib64/libc.so.6)
==230409==    by 0x4AE6901: abort (in /usr/lib64/libc.so.6)
==230409==    by 0x4AE7766: __libc_message_impl.cold (in /usr/lib64/libc.so.6)
==230409==    by 0x4B49508: __libc_fatal (in /usr/lib64/libc.so.6)
==230409==    by 0x4B50A05: __futex_lock_pi64 (in /usr/lib64/libc.so.6)
==230409==    by 0x4B57207: __pthread_mutex_lock_full (in /usr/lib64/libc.so.6)
==230409==    by 0x4B00779: __cxa_thread_atexit_impl (in /usr/lib64/libc.so.6)
==230409==    by 0x803826B: register_dtor<std::sys_common::thread_info::ThreadInfo> (fast_local.rs:161)
==230409==    by 0x803826B: __getit (fast_local.rs:56)
==230409==    by 0x803826B: try_with<std::sys_common::thread_info::ThreadInfo, std::sys_common::thread_info::{impl#0}::with::{closure_env#0}<std::thread::Thread, std::sys_common::thread_info::current_thread::{closure_env#0}>, std::thread::Thread> (local.rs:283)
==230409==    by 0x803826B: with<std::thread::Thread, std::sys_common::thread_info::current_thread::{closure_env#0}> (thread_info.rs:24)
==230409==    by 0x803826B: std::sys_common::thread_info::current_thread (thread_info.rs:34)
==230409==    by 0x8032F05: std::thread::current (mod.rs:708)
==230409==    by 0x7F95E78: std::thread::scoped::scope (scoped.rs:138)
==230409==    by 0x7F61D18: do_something (icfg.rs:124)
==230409== 
==230409== HEAP SUMMARY:
==230409==     in use at exit: 12,322,024 bytes in 80,601 blocks
==230409==   total heap usage: 7,507,924 allocs, 7,427,323 frees, 873,666,132 bytes allocated
==230409== 
==230409== LEAK SUMMARY:
==230409==    definitely lost: 6,172 bytes in 521 blocks
==230409==    indirectly lost: 0 bytes in 0 blocks
==230409==      possibly lost: 7,282,094 bytes in 20,289 blocks
==230409==    still reachable: 5,033,630 bytes in 59,790 blocks
==230409==         suppressed: 128 bytes in 1 blocks
==230409== Rerun with --leak-check=full to see details of leaked memory
==230409== 
==230409== Use --track-origins=yes to see where uninitialised values come from
==230409== For lists of detected and suppressed errors, rerun with: -s
==230409== ERROR SUMMARY: 192 errors from 96 contexts (suppressed: 0 from 0)
Aborted (core dumped)

Activity

added
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on May 9, 2024
the8472

the8472 commented on May 9, 2024

@the8472
Member

Rust built by CI links against an old glibc version for backwards compatibility. Maybe symbol versioning makes a difference? Having strace print stacktraces for each syscall might shed some light if different paths are taken.

Rot127

Rot127 commented on May 9, 2024

@Rot127
Author

Local libc version:

> /usr/lib64/libc.so.6
GNU C Library (GNU libc) stable release version 2.39.
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 14.0.1 20240411 (Red Hat 14.0.1-0).
libc ABIs: UNIQUE IFUNC ABSOLUTE
Minimum supported kernel: 3.2.0
For bug reporting instructions, please see:
<https://www.gnu.org/software/libc/bugs.html>.

Having strace print stacktraces for each syscall might shed some light if different paths are taken.

They are indeed very different for the scope() function. But it doesn't seem to be related to libc version:

CI toolchain with abort:

futex(0x7569c4474a08, FUTEX_LOCK_PI, NULL) = -1 EPERM (Operation not permitted)

/usr/lib64/libc.so.6(__futex_lock_pi64+0x25) [0x929c5]
/usr/lib64/libc.so.6(__pthread_mutex_lock_full+0x267) [0x99207]
/usr/lib64/libc.so.6(__cxa_thread_atexit_impl+0x69) [0x42779]
mylib.so(std::sys_common::thread_info::current_thread+0x3b) [0x11d3eb]
mylib.so(std::thread::current+0x5) [0x118085]
mylib.so(std::thread::scoped::scope+0x83) [0x7af23]
mylib.so(do_something+0x205) [0x43425]

With the locally built toolchain it never reaches __futex_lock_pi64. The next syscall executed is from within the scope() closure.

/usr/lib64/libc.so.6(__write+0x4d) [0x10b86d]
mylib.so(std::sys::pal::unix::fd::FileDesc::write+0x26) [0x14a126]
mylib.so(<std::sys::pal::unix::stdio::Stdout as std::io::Write>::write+0x34) [0x136154]
mylib.so(<std::io::stdio::StdoutRaw as std::io::Write>::write+0x1a) [0x147dea]
mylib.so(std::io::buffered::bufwriter::BufWriter<W>::flush_buf+0x85) [0x137fd5]
mylib.so(<std::io::buffered::bufwriter::BufWriter<W> as std::io::Write>::flush+0x8) [0x1381a8]
mylib.so(<&std::io::stdio::Stdout as std::io::Write>::flush+0x3d) [0x14804d]
mylib.so(<std::io::stdio::Stdout as std::io::Write>::flush+0xd) [0x147fad]
mylib.so(helper::progress::ProgressBar::print+0xee6) [0x122106]
mylib.so(helper::progress::ProgressBar::update_print+0x69) [0x1222f9]
mylib.so(do_something::{{closure}}+0x61) [0x8b811]
mylib.so(std::thread::scoped::scope::{{closure}}+0x35) [0x89db5]
mylib.so(<core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once+0x20) [0xa1d70]
mylib.so(std::panicking::try::do_call+0x2b) [0x846cb]
mylib.so(__rust_try+0x1a) [0x84b1a]
mylib.so(std::panicking::try+0x51) [0x84551]
mylib.so(std::thread::scoped::scope+0x2e5) [0x89a95]
mylib.so(do_something+0x205) [0x522b5]
the8472

the8472 commented on May 9, 2024

@the8472
Member

Your from-source toolchain is also 1.78? There were some recent changes around thread locals and thread parking on master.

Rot127

Rot127 commented on May 9, 2024

@Rot127
Author

Yes:

> git show HEAD
commit 9b00956e56009bab2aa15d7bff10916599e3d6d6 (HEAD, tag: 1.78.0, origin/stable, stable)
...

But let me try with latest master and see if the stack trace changes again.

Rot127

Rot127 commented on May 10, 2024

@Rot127
Author

Same result as above. __futex_lock_pi64 is never called.

added
T-libsRelevant to the library team, which will review and decide on the PR/issue.
C-bugCategory: This is a bug.
A-threadArea: `std::thread`
and removed
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on May 11, 2024
Rot127

Rot127 commented on Jul 5, 2024

@Rot127
Author

Building with the self build toolchain as described above now also gives me the error.
But only, if there are certain println!(). It is really really weird.

But building with the stable toolchain from rustup works in these cases.

Rot127

Rot127 commented on Jul 5, 2024

@Rot127
Author

After removing some println! and fields of some structs, I now get with the rustup toolchain:

Fatal glibc error: tpp.c:83 (__pthread_tpp_change_priority): assertion failed: new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)
Aborted (core dumped)

The program runs with two threads (one main, one scoped spawned).
Self built works fine again.

Rot127

Rot127 commented on Jul 15, 2024

@Rot127
Author

With 1.78.0 I now have even the problem, that scoped() is not even entered.

println!("Before scoped");
thread::scope(|s| {
    println!("Into scoped");
    ...
});
println!("After scoped");

prints only Before scoped scoped and freezes.

Everything works fine with toolchain <=1.76.0. 1.77.0/1.78.0 freeze.

There are only three lines of actual code difference.

git diff 1.76.0..1.77.0 library/std/src/thread/
diff --git a/library/std/src/thread/local.rs b/library/std/src/thread/local.rs
index def94acd457..d1213e2f166 100644
--- a/library/std/src/thread/local.rs
+++ b/library/std/src/thread/local.rs
@@ -16,7 +16,8 @@
 ///
 /// This key uses the fastest possible implementation available to it for the
 /// target platform. It is instantiated with the [`thread_local!`] macro and the
-/// primary method is the [`with`] method.
+/// primary method is the [`with`] method, though there are helpers to make
+/// working with [`Cell`] types easier.
 ///
 /// The [`with`] method yields a reference to the contained value which cannot
 /// outlive the current thread or escape the given closure.
@@ -25,14 +26,30 @@
 ///
 /// # Initialization and Destruction
 ///
-/// Initialization is dynamically performed on the first call to [`with`]
-/// within a thread, and values that implement [`Drop`] get destructed when a
-/// thread exits. Some caveats apply, which are explained below.
+/// Initialization is dynamically performed on the first call to a setter (e.g.
+/// [`with`]) within a thread, and values that implement [`Drop`] get
+/// destructed when a thread exits. Some caveats apply, which are explained below.
 ///
 /// A `LocalKey`'s initializer cannot recursively depend on itself. Using a
 /// `LocalKey` in this way may cause panics, aborts or infinite recursion on
 /// the first call to `with`.
 ///
+/// # Single-thread Synchronization
+///
+/// Though there is no potential race with other threads, it is still possible to
+/// obtain multiple references to the thread-local data in different places on
+/// the call stack. For this reason, only shared (`&T`) references may be obtained.
+///
+/// To allow obtaining an exclusive mutable reference (`&mut T`), typically a
+/// [`Cell`] or [`RefCell`] is used (see the [`std::cell`] for more information
+/// on how exactly this works). To make this easier there are specialized
+/// implementations for [`LocalKey<Cell<T>>`] and [`LocalKey<RefCell<T>>`].
+///
+/// [`std::cell`]: `crate::cell`
+/// [`LocalKey<Cell<T>>`]: struct.LocalKey.html#impl-LocalKey<Cell<T>>
+/// [`LocalKey<RefCell<T>>`]: struct.LocalKey.html#impl-LocalKey<RefCell<T>>
+///
+///
 /// # Examples
 ///
 /// ```
@@ -41,26 +58,20 @@
 ///
 /// thread_local!(static FOO: RefCell<u32> = RefCell::new(1));
 ///
-/// FOO.with(|f| {
-///     assert_eq!(*f.borrow(), 1);
-///     *f.borrow_mut() = 2;
-/// });
+/// FOO.with_borrow(|v| assert_eq!(*v, 1));
+/// FOO.with_borrow_mut(|v| *v = 2);
 ///
 /// // each thread starts out with the initial value of 1
 /// let t = thread::spawn(move|| {
-///     FOO.with(|f| {
-///         assert_eq!(*f.borrow(), 1);
-///         *f.borrow_mut() = 3;
-///     });
+///     FOO.with_borrow(|v| assert_eq!(*v, 1));
+///     FOO.with_borrow_mut(|v| *v = 3);
 /// });
 ///
 /// // wait for the thread to complete and bail out on panic
 /// t.join().unwrap();
 ///
 /// // we retain our original value of 2 despite the child thread
-/// FOO.with(|f| {
-///     assert_eq!(*f.borrow(), 2);
-/// });
+/// FOO.with_borrow(|v| assert_eq!(*v, 2));
 /// ```
 ///
 /// # Platform-specific behavior
@@ -137,10 +148,13 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
 ///     static BAR: RefCell<f32> = RefCell::new(1.0);
 /// }
 ///
-/// FOO.with(|foo| assert_eq!(*foo.borrow(), 1));
-/// BAR.with(|bar| assert_eq!(*bar.borrow(), 1.0));
+/// FOO.with_borrow(|v| assert_eq!(*v, 1));
+/// BAR.with_borrow(|v| assert_eq!(*v, 1.0));
 /// ```
 ///
+/// Note that only shared references (`&T`) to the inner data may be obtained, so a
+/// type such as [`Cell`] or [`RefCell`] is typically used to allow mutating access.
+///
 /// This macro supports a special `const {}` syntax that can be used
 /// when the initialization expression can be evaluated as a constant.
 /// This can enable a more efficient thread local implementation that
@@ -155,7 +169,7 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
 ///     pub static FOO: Cell<u32> = const { Cell::new(1) };
 /// }
 ///
-/// FOO.with(|foo| assert_eq!(foo.get(), 1));
+/// assert_eq!(FOO.get(), 1);
 /// ```
 ///
 /// See [`LocalKey` documentation][`std::thread::LocalKey`] for more
@@ -166,16 +180,18 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
 #[stable(feature = "rust1", since = "1.0.0")]
 #[cfg_attr(not(test), rustc_diagnostic_item = "thread_local_macro")]
 #[allow_internal_unstable(thread_local_internals)]
+// FIXME: Use `SyncUnsafeCell` instead of allowing `static_mut_refs` lint
+#[cfg_attr(not(bootstrap), allow(static_mut_refs))]
 macro_rules! thread_local {
     // empty (base case for the recursion)
     () => {};
 
-    ($(#[$attr:meta])* $vis:vis static $name:ident: $t:ty = const { $init:expr }; $($rest:tt)*) => (
+    ($(#[$attr:meta])* $vis:vis static $name:ident: $t:ty = const $init:block; $($rest:tt)*) => (
         $crate::thread::local_impl::thread_local_inner!($(#[$attr])* $vis $name, $t, const $init);
         $crate::thread_local!($($rest)*);
     );
 
-    ($(#[$attr:meta])* $vis:vis static $name:ident: $t:ty = const { $init:expr }) => (
+    ($(#[$attr:meta])* $vis:vis static $name:ident: $t:ty = const $init:block) => (
         $crate::thread::local_impl::thread_local_inner!($(#[$attr])* $vis $name, $t, const $init);
     );
Rot127

Rot127 commented on Jul 15, 2024

@Rot127
Author

Ok, seems to work fine in 1.79.0. Will keep this open for a while and test. But it can be closed probably.

Rot127

Rot127 commented on Jul 18, 2024

@Rot127
Author

Still a problem in 1.79.0. It seems to be related to formatted strings?

assert in libc: Fatal glibc error: tpp.c:83 (__pthread_tpp_change_priority): assertion failed: new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)

Triggered by having a formatted debug_assert!:

fn some_fcn() {
    for x in self
        .get_locked_values()
        .write()
        .unwrap() {
        // ...
          debug_assert!(
              <condition>,
              "Some formatted string: {} {} {}",
              arg0,
              arg1,
              arg2
          );
    }
}

some_fcn is called by another function, within the local thread.

Removing any of the arguments will make the code work. Two of the arguments are clones or references of members of the write-locked struct.
The last one from outside the loop.
I can also split up the debug in two print! and a debug_assert!(false) and it doesn't crash. Except if an argument in println! has a format specifier like {:#x}. So it seems to be connected to the number of arguments passed and their formatting.

Crashes:

print!("bla {}", arg0);
println!(
    "at {:#x} balbla {} blabla", // Without :#x it works fine.
    arg1, arg2
);

Using simply 0,1,2 for the arguments make the code work again.

I would debug it, but really don't have time, unfortunately.

Stack-trace:

Fatal glibc error: tpp.c:83 (__pthread_tpp_change_priority): assertion failed: new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)
==272775== 
==272775== Process terminating with default action of signal 6 (SIGABRT): dumping core
==272775==    at 0x4B5B4A4: __pthread_kill_implementation (pthread_kill.c:44)
==272775==    by 0x4B02C4D: raise (raise.c:26)
==272775==    by 0x4AEA901: abort (abort.c:79)
==272775==    by 0x4AEB766: __libc_message_impl.cold (libc_fatal.c:132)
==272775==    by 0x4AFABB6: __libc_assert_fail (__libc_assert_fail.c:31)
==272775==    by 0x4B61C95: __pthread_tpp_change_priority (tpp.c:83)
==272775==    by 0x4B5C473: __pthread_mutex_lock_full (pthread_mutex_lock.c:567)
==272775==    by 0x4B04D99: __cxa_thread_atexit_impl (cxa_thread_atexit_impl.c:117)
==272775==    by 0x80DD408: register_dtor<core::cell::once::OnceCell<std::thread::Thread>> (fast_local.rs:161)
==272775==    by 0x80DD408: __getit (fast_local.rs:56)
==272775==    by 0x80DD408: try_with<core::cell::once::OnceCell<std::thread::Thread>, std::thread::try_current::{closure_env#0}, std::thread::Thread> (local.rs:285)
==272775==    by 0x80DD408: try_current (mod.rs:716)
==272775==    by 0x80DD408: std::thread::current (mod.rs:741)
==272775==    by 0x7FBDDF4: std::thread::scoped::scope (scoped.rs:144)
changed the title [-]`thread::scope` aborts with `futex()` `EPERM` unexpected error code.[/-] [+]Reachable `libc` assert by `thread::scope`, when printing `RwLock` protected values as formatted strings.[/+] on Jul 18, 2024
added
E-needs-mcveCall for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example
on Sep 3, 2024
Rot127

Rot127 commented on Sep 25, 2024

@Rot127
Author

So the assert can also be reached by printing std::thread::current().id().

Fatal glibc error: tpp.c:83 (__pthread_tpp_change_priority): assertion failed: new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)
Fatal glibc error: tpp.c:83 (__pthread_tpp_change_priority): assertion failed: new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)

Still can't come up with a minimal working example. But will continue to try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-threadArea: `std::thread`C-bugCategory: This is a bug.E-needs-mcveCall for participation: This issue has a repro, but needs a Minimal Complete and Verifiable ExampleT-libsRelevant to the library team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @the8472@saethlin@lolbinarycat@Rot127@rustbot

        Issue actions

          Reachable `libc` assert by `thread::scope`, when printing `RwLock` protected values as formatted strings. · Issue #124920 · rust-lang/rust