Description
I was excited to see #[used]
stabilized (yay!) as one of the issues we suffer from in wasm-bindgen
is related to symbols being removed. Unfortunately though #[used]
doesn't solve our use case!
First I'll try to explain our issue a bit. The #[wasm_bindgen]
attribute allows you to import JS functionality into a Rust program. This doesn't work, however, when you import JS functions into a private Rust submodule. (aka mod foo { ... }
). When importing a function we also generate an internal exported function which the CLI wasm-bindgen
tool uses (and then removes), but it suffices to say that we're generating code that looks like:
mod private {
#[no_mangle]
pub extern fn foo() { /* ... */ }
}
Today the symbol foo
is not considered alive by rustc itself as it's not reachable. As a result, it's not even translated into the object file. If we instead change this though:
#![feature(used)]
mod private {
#[no_mangle]
pub extern fn foo() {}
#[used]
static F: extern fn() = foo;
}
This still doesn't work! Unfortunately for us the #[used]
works as intended but doesn't affect the symbol visibility. The above program generates this IR:
; ModuleID = 'playground.7pbp0xok-cgu.0'
source_filename = "playground.7pbp0xok-cgu.0"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
@_ZN10playground7private1F17hb0dc3802d85fadd7E = internal constant <{ i8*, [0 x i8] }> <{ i8* bitcast (void ()* @foo to i8*), [0 x i8] zeroinitializer }>, align 8
@llvm.used = appending global [1 x i8*] [i8* bitcast (<{ i8*, [0 x i8] }>* @_ZN10playground7private1F17hb0dc3802d85fadd7E to i8*)], section "llvm.metadata"
; Function Attrs: norecurse nounwind readnone uwtable
define internal void @foo() unnamed_addr #0 {
start:
ret void
}
attributes #0 = { norecurse nounwind readnone uwtable "probe-stack"="__rust_probestack" }
the problem here is that the symbol foo
, while not mangled, is still marked as internal
. This in turns means that it does indeed reach the linker, but for our purposes in wasm-bindgen
we need it to survive the linker, not just reach the linker.
Ok so that's the problem statement for wasm-bindgen
, but you can generalize it today for rustc by asking: what does #[used]
do to symbol visibility? The overall story for symbol visibility in rustc is a little muddied and not always great (especially on ABI-particulars like #[no_mangle]
things).
What should the symbol visibility of foo
be here?
mod private {
#[no_mangle]
pub extern fn foo() {}
#[used]
static F: extern fn() = foo;
}
We've always had a basic rule of thumb in Rust that "reachable symbols" have non-internal visibility, but it's not clear what to do here. foo
is indeed a reachable symbol because of #[used]
, but it's in a private module. Does that mean because of pub
and #[no_mangle]
it shouldn't have internal
visibility? Should only #[no_mangle]
imply that? It's unclear to me!
I'd naively like to send a patch that makes foo not-internal
because it has #[no_mangle]
and pub
(not that it's "publicly reachable"). I think though that this may be deeper in the compiler. I just took a look at how #[used]
works, and it's actually a little suprising!
In src/librustc_mir/monomorphize/collector.rs
we attempt to not translate anything not reachable in a crate as a form of DCE. I didn't find any handling of #[used]
, though, and it turns out we unconditionally translate all statics all the time! Then becuase we put it in llvm.used
it ends up not getting gc'd by LLVM.
I think that we may want to future-proof this by updating the src/librustc/middle/reachable.rs
collection step to basically push #[used]
statics onto the worklist to process. The initial worklist is seeded with all items that are public by visibility, and I think we could change it to also be seeded with any #[used]
statics. This means that anything referenced by a #[used]
static will be pulled in as a result.
Do others think this is a reasonable strategy for having #[used]
affect symbol visibility?
cc @michaelwoerister
cc @fitzgen
cc @japaric
Activity
alexcrichton commentedon Sep 11, 2018
I should also note, I'm not even sure that such a tweak here would actually fix our use case in
wasm-bindgen
. I would want to actually test a compiler first to make sure it works before actually submitting a change like this.japaric commentedon Sep 17, 2018
In my experience, which pretty much only involves ELF objects, whenever I use
#[no_mangle]
or#[export_name]
I always want global / externalvisibility.
Some info about my use case:
I always pair
EXTERN
with#[export_name]
/#[no_mangle]
. I useEXTERN
toforce the linker to exhaustively search for a symbol in the input object files.
EXTERN
also forces the linker to keep the symbol in the output binary.I sometimes pair
#[link_section]
withKEEP
to place symbols in specificmemory locations, but when I do I always need
EXTERN
to work around the linkerlaziness. Without
EXTERN
the linker would stop looking through the inputobject files once it has resolved all pending undefined references and would
miss the
link_section
-ed symbols that appear later in the list.EXTERN
skips internal symbols (a) which means that I have to be careful tomake my
#[no_mangle]
/#[export_name]
functions andstatic
variables bothpub
lic and reachable (i.e. no private module between them and the root of thecrate).
I don't have much problem with this issue but we have embedded crates that
provide attributes that expand into
#[export_name]
. This means that embeddedend users need to become aware of this "reachability" property, which is not
ideal. There's no way to enforce at compile time that an item marked with a
custom attribute is reachable so the end users can run into linker errors or,
worst, end up with a program that ignores their attribute.
With that background info out of the way,
My proposal would be to instead have
#[no_mangle]
/#[export_name]
implyglobal visibility, regardless of whether the symbol is
pub
or reachable.Would that solve your use case? I think that if
#[no_mangle]
impliesglobal visibility then you won't need the
#[used]
static
because thecompiler always codegens global symbols (?).
Is there an scenario where someone would want both
#[no_mangle]
/#[export_name]
and internal visibility? I can't think of any such case(b).
There's also the third option of adding a new attribute to control visibility:
#[visibility(internal)]
,#[visibility(external)]
, etc.Some people (myself included) would find that surprising as
#[used]
wasdesigned to have the same semantics as
__attribute__((used))
and that onedoesn't affect symbol visibility.
(a) LD's
-u
flag has the same semantics asEXTERN
. Here you can see that itdoesn't work on internal symbols.
(b) Internal symbols do not work for intra-Rust FFI which is the only case I
can think of.
This fails to link:
If you tweak
bar
like thisThen the program links.
parched commentedon Sep 17, 2018
One reason would be when using
global_asm!
. However, I think the better way to solve that use case is a way to pass rust mangled names toglobal_asm!
.japaric commentedon Sep 17, 2018
Due to parallel codegen I don't think it's guaranteed that the
#[no_mangle]
-ed symbol and theglobal_asm!
symbol will end in the same object file even if they are defined in the same module / crate. If that's not the case the linker could pick up theglobal_asm!
symbol from object file but miss the#[no_mangled]
symbol leading to an "undefined reference" error.I have had plenty of trouble with multiple codegen units and I have found that the only thing that reliably works is
EXTERN
and external visibility.alexcrichton commentedon Sep 17, 2018
Thank for writing that up @japaric! I think I agree with you that
#[used]
shouldn't be used to affect symbol visibility, drawing parallels with C is a compelling use case.I hadn't really considered it before but forcing all
#[no_mangle]
items to an extern linkage visibility doesn't seem like it's such a bad idea! The only other use case I can think of to add to @parched's use case is the oldrust_begin_unwind
symbol (or mayberust_panic
nowadays, I forget) which was intended for adding breakpoints to panics, but IIRC this doesn't really work reliably anyway.We do have an unstable
#[linkage]
attribute which can explicitly control linkage (although it would likely require updates to the MIR item collector to ensure it always visits items with#[linkage]
), and I think that we may just want to go that route. I like the idea of always making#[no_mangle]
items have external linkage unless there's some attribute saying it should be internal (like#[visibility(private)]
or#[linkage = "internal"]
.japaric commentedon Sep 17, 2018
👍. Just want to add: let's apply the same logic to
#[export_name]
;#[no_mangle]
is just sugar for#[export_name = "<item-name>"]
, after all.alexcrichton commentedon Sep 17, 2018
Ah sorry yeah,
#[export_name]
I consider to be the same as#[no_mangle]
as well.retep998 commentedon Sep 18, 2018
Coming from the perspective of the Windows linker model, I am absolutely in agreement that
#[no_mangle]
and#[export_name]
should always imply external linkage.japaric commentedon Sep 21, 2018
I don't know if this kind of change needs a RFC or FCP but I have put up a PR that implements this in #54414.
33 remaining items