Skip to content

capturing stack backtrace becomes slower and sometimes segfaults on Apple Silicon #104388

Open
Listed in
@skyzh

Description

@skyzh
Contributor

Sorry for not having a MCVE for this issue. I'm still constructing it. For now I have to refer to a large codebase to reproduce the issue.

Basically, after we upgrade from nightly-2022-07-29 to nightly-2022-10-16, we found two things:

This issue is stably reproducible on some specific commits of our project with some specific way of compiling, so I guess it's probably not related to incremental compile. I guess it would be more likely to be a problem with the LLVM 15 upgrade in August or the std::backtrace::Backtrace stabilization.

Reproduce 1

On this commit: risingwavelabs/risingwave@227e9e5

RUST_BACKTRACE=1 cargo run --bin risingwave -- playground

In another terminal, use psql (Postgres's client) to connect to the program:

psql -h localhost -p 4566 -d dev -U root
CREATE TABLE t(a int, b int);
CREATE VIEW v AS SELECT * FROM t;
DROP TABLE t;

The program will immediately segfault in Backtrace::capture.

Interestingly, if we use cargo build -p risingwave_cmd_all && ./target/debug/risingwave playground, it works. The commit following the buggy commit risingwavelabs/risingwave@604a0a5 also magically resolves the issue with some random code change.

Reproduce 2

On this commit: risingwavelabs/risingwave@484b9ab

cargo build -p risingwave_cmd
RUST_BACKTRACE=1 ./target/debug/meta-node # in terminal 1
RUST_BACKTRACE=1 ./target/debug/compute-node # in terminal 2
RUST_BACKTRACE=1 ./target/debug/frontend # in terminal 3

In another terminal:

psql -h localhost -p 4566 -d dev -U root
CREATE TABLE BOOLTBL2 (f1 bool); INSERT INTO BOOLTBL2 (f1) VALUES (bool 'XXX');

compute-node will also immediately segfault when capturing a backtrace.

Thanks for investigating into this!

Meta

rustc --version --verbose:

rustc 1.66.0-nightly (b8c35ca26 2022-10-15)
binary: rustc
commit-hash: b8c35ca26b191bb9a9ac669a4b3f4d3d52d97fb1
commit-date: 2022-10-15
host: aarch64-apple-darwin
release: 1.66.0-nightly
LLVM version: 15.0.2
Backtrace

<backtrace>

Activity

skyzh

skyzh commented on Nov 14, 2022

@skyzh
ContributorAuthor

@rustbot label +A-LLVM +T-compiler

added
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
on Nov 14, 2022
skyzh

skyzh commented on Nov 14, 2022

@skyzh
ContributorAuthor

We also observed that DWARF region increased significantly between the toolchain update:

The suspicious thing

is the dwarf region in the binary.

On 2022-07-29:

objdump target/debug/risingwave  --dwarf=frame | wc -l
     371

On 2022-10-16:

objdump target/debug/risingwave  --dwarf=frame | wc -l
 14671494

Which may relate to the slower backtrace, but I still have no idea why it will segfault sometimes...

skyzh

skyzh commented on Nov 14, 2022

@skyzh
ContributorAuthor

I also had an idea that it may be related to the debuginfo=unpacked stabilization, but it turned out that debuginfo is always unpacked on macOS across the two toolchains? :(

skyzh

skyzh commented on Nov 14, 2022

@skyzh
ContributorAuthor

segfault backtrace:

* thread #10, name = 'risingwave-main', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00000001944f6770 libunwind.dylib`libunwind::CFI_Parser<libunwind::LocalAddressSpace>::parseFDEInstructions(libunwind::LocalAddressSpace&, libunwind::CFI_Parser<libunwind::LocalAddressSpace>::FDE_Info const&, libunwind::CFI_Parser<libunwind::LocalAddressSpace>::CIE_Info const&, unsigned long, int, libunwind::CFI_Parser<libunwind::LocalAddressSpace>::PrologInfo*) + 204
libunwind.dylib`libunwind::CFI_Parser<libunwind::LocalAddressSpace>::parseFDEInstructions:
->  0x1944f6770 <+204>: ldrb   w8, [x28], #0x1
    0x1944f6774 <+208>: stur   x28, [x29, #-0x98]
    0x1944f6778 <+212>: cmp    w8, #0x2f
    0x1944f677c <+216>: b.hi   0x1944f72c4               ; <+3104>
Target 0: (risingwave) stopped.
(lldb) bt
* thread #10, name = 'risingwave-main', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00000001944f6770 libunwind.dylib`libunwind::CFI_Parser<libunwind::LocalAddressSpace>::parseFDEInstructions(libunwind::LocalAddressSpace&, libunwind::CFI_Parser<libunwind::LocalAddressSpace>::FDE_Info const&, libunwind::CFI_Parser<libunwind::LocalAddressSpace>::CIE_Info const&, unsigned long, int, libunwind::CFI_Parser<libunwind::LocalAddressSpace>::PrologInfo*) + 204
    frame #1: 0x00000001944f6624 libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_arm64>::getInfoFromFdeCie(libunwind::CFI_Parser<libunwind::LocalAddressSpace>::FDE_Info const&, libunwind::CFI_Parser<libunwind::LocalAddressSpace>::CIE_Info const&, unsigned long, unsigned long) + 100
    frame #2: 0x00000001944f62fc libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_arm64>::getInfoFromDwarfSection(unsigned long, libunwind::UnwindInfoSections const&, unsigned int) + 184
    frame #3: 0x00000001944f6220 libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_arm64>::setInfoBasedOnIPRegister(bool) + 1228
    frame #4: 0x00000001944f86b0 libunwind.dylib`libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_arm64>::step() + 696
    frame #5: 0x00000001944fb0f0 libunwind.dylib`_Unwind_Backtrace + 348
    frame #6: 0x00000001090822cc risingwave`std::backtrace::Backtrace::create::h908375f7f84cb508 [inlined] std::backtrace_rs::backtrace::libunwind::trace::h471a59e08ff9e5dc at mod.rs:66:5 [opt]
    frame #7: 0x00000001090822bc risingwave`std::backtrace::Backtrace::create::h908375f7f84cb508 [inlined] std::backtrace_rs::backtrace::trace_unsynchronized::h4e694232d85e2708 at mod.rs:66:5 [opt]
    frame #8: 0x00000001090822b0 risingwave`std::backtrace::Backtrace::create::h908375f7f84cb508 at backtrace.rs:333:13 [opt]
    frame #9: 0x00000001056e7b5c risingwave`_$LT$risingwave_meta..error..MetaError$u20$as$u20$core..convert..From$LT$risingwave_meta..error..MetaErrorInner$GT$$GT$::from::hb4b62fbc8685e728(inner=<unavailable>) at error.rs:69:18
    frame #10: 0x0000000105e86adc risingwave`_$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$::into::h7837bc8fb77e8181(self=<unavailable>) at mod.rs:726:9
    frame #11: 0x00000001056e7fe4 risingwave`risingwave_meta::error::MetaError::permission_denied::h6592a4f64415a283(s=<unavailable>) at error.rs:103:9
skyzh

skyzh commented on Nov 14, 2022

@skyzh
ContributorAuthor

The latest nightly (2022-11-14) will still produce this segfault. It's somehow very easy to reproduce over the codebase, but some simple changes in the codebase (e.g., add a println before capturing backtrace) will make it magically work again.

Maybe related: #47551?

skyzh

skyzh commented on Nov 14, 2022

@skyzh
ContributorAuthor

@rustbot label +A-LLVM +I-crash

added
I-crashIssue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics.
on Nov 14, 2022
skyzh

skyzh commented on Nov 14, 2022

@skyzh
ContributorAuthor
skyzh

skyzh commented on Nov 15, 2022

@skyzh
ContributorAuthor

Can be workaround by using rust-lld or LLVM 15 lld (brew install llvm@15)

rustflags = [
  "-Clink-arg=-fuse-ld=/opt/homebrew/opt/llvm@15/bin/ld64.lld",
]

So it looks like a bug with macOS's bundled linker?

$ cc --version
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: arm64-apple-darwin22.1.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
changed the title [-]capturing stack backtrace becomes slower and sometimes segfaults[/-] [+]capturing stack backtrace becomes slower and sometimes segfaults on Apple Silicon[/+] on Nov 15, 2022

19 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-runtimeArea: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflowsC-bugCategory: This is a bug.I-crashIssue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics.O-AArch64Armv8-A or later processors in AArch64 modeO-macosOperating system: macOST-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @skyzh@IvanGoncharov@xxchan@workingjubilee@rustbot

        Issue actions

          capturing stack backtrace becomes slower and sometimes segfaults on Apple Silicon · Issue #104388 · rust-lang/rust