Closed
Description
Consider the following minimal project:
Cargo.toml
[package]
name = "tarpotest"
version = "0.1.0"
authors = ["Me <me@me.me>"]
edition = "2018"
[dependencies]
futures-executor = "0.2.1"
src/lib.rs
#[test]
pub fn a() {
futures_executor::ThreadPool::new();
}
#[test]
pub fn b() {
futures_executor::ThreadPool::new();
}
Install tarpaulin from crates.io:
RUSTFLAGS="--cfg procmacro2_semver_exempt" cargo install cargo-tarpaulin
And run it on this small project using either stable (rustc 1.32.0 (9fda7c223 2019-01-16)
) or beta (rustc 1.33.0-beta.5 (1045131c1 2019-01-31)
) rustc:
cargo +stable tarpaulin
: works as expectedcargo +beta tarpaulin
: occasionally, (in roughly 15% of the runs), tarpaulin errors as the test executable segfaulted
Error: Failed to run tests! Error: A segfault occurred while executing tests
I have no idea what is at fault here, but apparently something in the last beta of rustc was changed which broke the way tarpaulin does its instrumentation. For more details, the segfault appears to only occur if at least 2 tests in the same binary involve spawning threads, and seem to be more likely the more threads are spawned.
See xd009642/tarpaulin#190 for full details.
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
pnkfelix commentedon Feb 14, 2019
visiting for triage. P-high for initial investigation; assigning that task to @nagisa
nagisa commentedon Feb 17, 2019
This will be hard for me to reliably bisect because among the failures related to sigsegv I also get
Bisection says that the sigsegv began in 2fba17f (#56837), cc @nikomatsakis @arielb1
The callstack looks like this:
I won’t be able to look into it much more at the moment.
arielb1 commentedon Feb 17, 2019
I suppose I'll have to investigate this then. Hope I can find enough time.
pnkfelix commentedon Feb 21, 2019
triage: speculatively assigning to @arielb1 . They should hopefully unassign themselves if they are unable to look into this.
Mark-Simulacrum commentedon Feb 21, 2019
Realistically I expect this to be a regression because we have ~4 days left before we need to produce the stable artifacts.
arielb1 commentedon Feb 21, 2019
had some busy time, will try to look at this quickly
arielb1 commentedon Feb 22, 2019
This appears to be triggered by the breakpoints
tarpaulin
places, which makes me suspect there is some race condition here we somehow create infutures
. This doesn't sound good.arielb1 commentedon Feb 22, 2019
Looking at the backtrace
Something is smashing the stack of
ThreadPoolBuilder::create
(%rcx should contain the address of the return value. which is 0x7fffffffcff0, but it contains just 0xffffcff0).arielb1 commentedon Feb 22, 2019
This is getting weird. The call %rdi is not pointing to the correct place???
Similar crashes:
one crash where the stack is not 16-byte aligned, apparently starting from
tarpotest::b::{{closure}}
, leading tomovaps
crashes (how is this happening?).arielb1 commentedon Feb 22, 2019
I smell a bug in
tarpaulin
itself handling interrupts concurrently.arielb1 commentedon Feb 22, 2019
It appears that in bad traces, somehow the SIGTRAP signal is never received for the second breakpoint (at 0x403820) despite that byte being an
0xcc
, which occurs like 5 instructions after the first one (at 0x403860), which feels weird and hard to be a miscompilation.bad trace:
good trace:
arielb1 commentedon Feb 22, 2019
Reproduction Status
I can reproduce the segfault on 244b05d - before 2fba17f (#56837), which means the bisection was wrong. I used tarpaulin f95f717222e708f6bf0f7c5d2ee0b8f63a9f5c7e (from Jan 21).
This also doesn't feel to me like quite a miscompilation bug, but rather some ptrace oddity in tarpaulin. It would be nice if someone who understands ptrace would look at this.
nagisa commentedon Feb 23, 2019
@arielb1 weird, I even did bisect twice running the reproducer 32 times at each step on each bors merge, and it ended up pointing out the same commit both times. And I was thinking that the PR indeed looked very unlikely to be the actual cause.
I bisected again, based on prebuilt nightlies, twice, running the reproduction 64 times at each step and it points out a range of 2018-12-08 to 2018-12-14 – way before your PR landed. Sorry for the false alert!
I will run bisection on merges between 2018-12-08 and 2018-12-14 now.
nagisa commentedon Feb 23, 2019
Hit 3499575 (#56243). The change seems significantly more relevant, and also something that broke multiple things in the past, mostly pointing to invalid assumptions in users’ code, which also matches @arielb1’s investigation results and feelings.
arielb1 commentedon Feb 23, 2019
In that case, there is probably nothing actionable on our side.
pnkfelix commentedon Feb 28, 2019
Nominating for discussion at T-compiler meeting. My predisposition is to close this bug as non-actionable.
pnkfelix commentedon Apr 4, 2019
closing as non-actionable.