Description
So, here's my ultimate desire for this eventually so that stuff like this doesn't happen again:
- There is a single, canonical name mangler in rust. Anything that ever emits a binary symbol uses this mangler, without exception, and can't modify the bytes; it and it alone are responsible for the ABI facing names.
- Even better, this is enforced in the types, and only at the last minute is the raw str pointer or bytes passed to the backend to emit ABI names
- The rust testing suite verifies that every symbol in a generated binary are validly mangled.
You might be surprised to learn that 1 is not true in practice. So I've opened this issue for tracking various cases I've found, and what follows are the specimens:
Examples
As of rustc 1.22, on the following program compiled via rustc hello.rs
:
fn main() {
println!("hello");
}
I am seeing the following issues:
Spurious thread locals
symbol: .tdata._ZN3std10sys_common11thread_info11THREAD_INFO7__getit5__KEY17hc69464ef038d7e85E
40 LOCAL TLS .tdata._ZN3std10sys_common11thread_info11THREAD_INFO7__getit5__KEY17hc69464ef038d7e85E 0x0 .tdata(20) 0x0
Example source location: https://github.com//m4b/rust/blob/383e313d181eceb3155eb1089d448144f830ee23/src/libstd/sys_common/thread_info.rs#L21
Why incorrect
-
does not start with _ZN.
-
This looks like its meant to be the section name. The exact same TLS variable does occur later on, with the same address:
40 LOCAL TLS std::sys_common::thread_info::THREAD_INFO::__getit::__KEY::hc69464ef038d7e85 0x30 .tdata(20) 0x0
but correctly mangled.
This could be an llvm bug; I have verified this only seems to occur to variables generated via the thread_local!
macro, and furthermore, is inside of the crate .rlib
static archive. (you can output this by something like:
ar p libstd-fe0b1b991511fcaa.rlib std-fe0b1b991511fcaa.std0.rust-cgu.o > libstd.o
(your libstd-<hash>
will vary of course tho)
E.g. here is every symbol with either .tdata
in its name or referencing that section in rust 1.22 libstd object file (inside .rlib
):
sections:
3473 .tdata._ZN3std11collections4hash3map11RandomState3new4KEYS7__getit5__KEY17h98644cd8ad1049dbE SHT_PROGBITS WRITE ALLOC TLS 0x42030 0x0 0x20 0x0 0x8
3543 .tdata._ZN3std2io5stdio12LOCAL_STDOUT7__getit5__KEY17h53b08df14c3cb33dE SHT_PROGBITS WRITE ALLOC TLS 0x42440 0x0 0x28 0x0 0x20
3627 .tdata._ZN3std10sys_common11thread_info11THREAD_INFO7__getit5__KEY17hc69464ef038d7e85E SHT_PROGBITS WRITE ALLOC TLS 0x42840 0x0 0x30 0x0 0x20
3647 .tdata._ZN3std9panicking12LOCAL_STDERR7__getit5__KEY17h715a8958c4cd11efE SHT_PROGBITS WRITE ALLOC TLS 0x42980 0x0 0x28 0x0 0x20
3659 .tdata._ZN3std9panicking18update_panic_count11PANIC_COUNT7__getit5__KEY17h01a9f669bb84595fE SHT_PROGBITS WRITE ALLOC TLS 0x42a18 0x0 0x18 0x0 0x8
3665 .tdata._ZN3std4rand10thread_rng14THREAD_RNG_KEY7__getit5__KEY17h8ec4cb227256fe90E SHT_PROGBITS WRITE ALLOC TLS 0x42a80 0x0 0x10 0x0 0x8
symbols:
0 LOCAL TLS .tdata._ZN3std10sys_common11thre… 0x0 .tdata._ZN3std10sys_common11thread_info11THREAD_INFO7__getit5__KEY17hc69464ef038d7e85E(3627) 0x0
0 LOCAL TLS .tdata._ZN3std11collections4hash… 0x0 .tdata._ZN3std11collections4hash3map11RandomState3new4KEYS7__getit5__KEY17h98644cd8ad1049dbE(3473) 0x0
0 LOCAL TLS .tdata._ZN3std2io5stdio12LOCAL_S… 0x0 .tdata._ZN3std2io5stdio12LOCAL_STDOUT7__getit5__KEY17h53b08df14c3cb33dE(3543) 0x0
0 LOCAL TLS .tdata._ZN3std4rand10thread_rng1… 0x0 .tdata._ZN3std4rand10thread_rng14THREAD_RNG_KEY7__getit5__KEY17h8ec4cb227256fe90E(3665) 0x0
0 LOCAL TLS .tdata._ZN3std9panicking12LOCAL_… 0x0 .tdata._ZN3std9panicking12LOCAL_STDERR7__getit5__KEY17h715a8958c4cd11efE(3647) 0x0
0 LOCAL TLS .tdata._ZN3std9panicking18update… 0x0 .tdata._ZN3std9panicking18update_panic_count11PANIC_COUNT7__getit5__KEY17h01a9f669bb84595fE(3659) 0x0
0 LOCAL TLS _ZN3std10sys_common11thread_info… 0x30 .tdata._ZN3std10sys_common11thread_info11THREAD_INFO7__getit5__KEY17hc69464ef038d7e85E(3627) 0x0
0 LOCAL TLS _ZN3std11collections4hash3map11R… 0x20 .tdata._ZN3std11collections4hash3map11RandomState3new4KEYS7__getit5__KEY17h98644cd8ad1049dbE(3473) 0x0
0 LOCAL TLS _ZN3std2io5stdio12LOCAL_STDOUT7_… 0x28 .tdata._ZN3std2io5stdio12LOCAL_STDOUT7__getit5__KEY17h53b08df14c3cb33dE(3543) 0x0
0 LOCAL TLS _ZN3std4rand10thread_rng14THREAD… 0x10 .tdata._ZN3std4rand10thread_rng14THREAD_RNG_KEY7__getit5__KEY17h8ec4cb227256fe90E(3665) 0x0
0 LOCAL TLS _ZN3std9panicking12LOCAL_STDERR7… 0x28 .tdata._ZN3std9panicking12LOCAL_STDERR7__getit5__KEY17h715a8958c4cd11efE(3647) 0x0
0 LOCAL TLS _ZN3std9panicking18update_panic_… 0x18 .tdata._ZN3std9panicking18update_panic_count11PANIC_COUNT7__getit5__KEY17h01a9f669bb84595fE(3659)
The spurious symbols will never be gc'd by the linker, and they will never get referenced; so they're just taking up space (albeit not much).
backtrace.rs nested static
symbol: _ZN3std10sys_common9backtrace11log_enabled7ENABLED17hc187c5b3618ccb2eE.0.0
Example source location: https://github.com//m4b/rust/blob/383e313d181eceb3155eb1089d448144f830ee23/src/libstd/sys_common/backtrace.rs#L148
Why Incorrect
No mangled symbol is allowed to have characters after the final E
, but this has .0.0
2622e8 LOCAL OBJECT _ZN3std10sys_common9backtrace11log_enabled7ENABLED17hc187c5b3618ccb2eE.0.0 0x8 .bss(27) 0x0
I have definitely seen other examples of this, and with different numbers at the end; I think it has to do with nested statics somehow.
Nightly
On nightly it looks like there has been a pretty substantial regression w.r.t. valid symbol names being output:
bingrep -D -t 65 hello-nightly | grep -e "E...[[:digit:]] "
26ce98 GLOBAL OBJECT ref.7.llvm.D64EB761 0x18 .data.rel.ro(23) 0x2
26e160 GLOBAL OBJECT _ZN3std3sys4unix2os8ENV_LOCK17hbf5ac5d1fa9db31cE.llvm.D64EB761 0x28 .data(26) 0x2
5bf60 GLOBAL OBJECT str.4.llvm.6C0E7CF1 0x1a .rodata(16) 0x2
5bf7a GLOBAL OBJECT str.5.llvm.6C0E7CF1 0x0 .rodata(16) 0x2
548f0 GLOBAL FUNC _ZN4core3ptr13drop_in_place17hd0b6a86080ab42c4E.llvm.F74E5798 0x6 .text(14) 0x2
5bf80 GLOBAL OBJECT ref.7.llvm.6C0E7CF1 0x40 .rodata(16) 0x2
5d1d0 GLOBAL OBJECT str.a.llvm.F74E5798 0x1f .rodata(16) 0x2
26d920 GLOBAL OBJECT panic_bounds_check_loc.e.llvm.F74E5798 0x18 .data.rel.ro(23) 0x2
This runs the whole gamut of functions, global memory, read only strings, all apparently (sometimes) having extra characters appended.
Special Mentions
{{closure}}
in symbols are useless, and very hard to print in debuggers.
E.g.:
47d80 LOCAL FUNC core::fmt::Formatter::pad_integral::{{closure}}::h6acabc645f5ef2ad 0x10f .text(14) 0x0
which is from the use of this closure:
The compiler knows the line number (it will even omit this sometimes like @[closure; mod.rs:1108]
or whatever); why not just output that instead of {{closure}}
?