Skip to content

#[used] and symbol visibility is unclear #54135

Closed
@alexcrichton

Description

@alexcrichton
Member

I was excited to see #[used] stabilized (yay!) as one of the issues we suffer from in wasm-bindgen is related to symbols being removed. Unfortunately though #[used] doesn't solve our use case!

First I'll try to explain our issue a bit. The #[wasm_bindgen] attribute allows you to import JS functionality into a Rust program. This doesn't work, however, when you import JS functions into a private Rust submodule. (aka mod foo { ... }). When importing a function we also generate an internal exported function which the CLI wasm-bindgen tool uses (and then removes), but it suffices to say that we're generating code that looks like:

mod private {
    #[no_mangle]
    pub extern fn foo() { /* ... */ }
}

Today the symbol foo is not considered alive by rustc itself as it's not reachable. As a result, it's not even translated into the object file. If we instead change this though:

#![feature(used)]

mod private {
    #[no_mangle]
    pub extern fn foo() {}
    
    #[used]
    static F: extern fn() = foo;
}

This still doesn't work! Unfortunately for us the #[used] works as intended but doesn't affect the symbol visibility. The above program generates this IR:

; ModuleID = 'playground.7pbp0xok-cgu.0'
source_filename = "playground.7pbp0xok-cgu.0"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@_ZN10playground7private1F17hb0dc3802d85fadd7E = internal constant <{ i8*, [0 x i8] }> <{ i8* bitcast (void ()* @foo to i8*), [0 x i8] zeroinitializer }>, align 8
@llvm.used = appending global [1 x i8*] [i8* bitcast (<{ i8*, [0 x i8] }>* @_ZN10playground7private1F17hb0dc3802d85fadd7E to i8*)], section "llvm.metadata"

; Function Attrs: norecurse nounwind readnone uwtable
define internal void @foo() unnamed_addr #0 {
start:
  ret void
}

attributes #0 = { norecurse nounwind readnone uwtable "probe-stack"="__rust_probestack" }

the problem here is that the symbol foo, while not mangled, is still marked as internal. This in turns means that it does indeed reach the linker, but for our purposes in wasm-bindgen we need it to survive the linker, not just reach the linker.


Ok so that's the problem statement for wasm-bindgen, but you can generalize it today for rustc by asking: what does #[used] do to symbol visibility? The overall story for symbol visibility in rustc is a little muddied and not always great (especially on ABI-particulars like #[no_mangle] things).

What should the symbol visibility of foo be here?

mod private {
    #[no_mangle]
    pub extern fn foo() {}
    
    #[used]
    static F: extern fn() = foo;
}

We've always had a basic rule of thumb in Rust that "reachable symbols" have non-internal visibility, but it's not clear what to do here. foo is indeed a reachable symbol because of #[used], but it's in a private module. Does that mean because of pub and #[no_mangle] it shouldn't have internal visibility? Should only #[no_mangle] imply that? It's unclear to me!

I'd naively like to send a patch that makes foo not-internal because it has #[no_mangle] and pub (not that it's "publicly reachable"). I think though that this may be deeper in the compiler. I just took a look at how #[used] works, and it's actually a little suprising!

In src/librustc_mir/monomorphize/collector.rs we attempt to not translate anything not reachable in a crate as a form of DCE. I didn't find any handling of #[used], though, and it turns out we unconditionally translate all statics all the time! Then becuase we put it in llvm.used it ends up not getting gc'd by LLVM.

I think that we may want to future-proof this by updating the src/librustc/middle/reachable.rs collection step to basically push #[used] statics onto the worklist to process. The initial worklist is seeded with all items that are public by visibility, and I think we could change it to also be seeded with any #[used] statics. This means that anything referenced by a #[used] static will be pulled in as a result.

Do others think this is a reasonable strategy for having #[used] affect symbol visibility?

cc @michaelwoerister
cc @fitzgen
cc @japaric

Activity

alexcrichton

alexcrichton commented on Sep 11, 2018

@alexcrichton
MemberAuthor

I should also note, I'm not even sure that such a tweak here would actually fix our use case in wasm-bindgen. I would want to actually test a compiler first to make sure it works before actually submitting a change like this.

japaric

japaric commented on Sep 17, 2018

@japaric
Member

In my experience, which pretty much only involves ELF objects, whenever I use
#[no_mangle] or #[export_name] I always want global / external
visibility.


Some info about my use case:

I always pair EXTERN with #[export_name] / #[no_mangle]. I use EXTERN to
force the linker to exhaustively search for a symbol in the input object files.
EXTERN also forces the linker to keep the symbol in the output binary.

#[no_mangle]
pub fn main() { .. }
/* exhaustively search for `main` in the input object files */
EXTERN(main);

I sometimes pair #[link_section] with KEEP to place symbols in specific
memory locations, but when I do I always need EXTERN to work around the linker
laziness. Without EXTERN the linker would stop looking through the input
object files once it has resolved all pending undefined references and would
miss the link_section-ed symbols that appear later in the list.

#[link_section = ".vector_table.reset_vector"]
#[no_mangle]
pub static RESET_VECTOR: fn() = reset;

#[link_section = ".vector_table.exceptions"]
#[no_mangle]
pub static EXCEPTIONS: [fn(); 14] = [..];
EXTERN(RESET_VECTOR);
EXTERN(EXCEPTIONS);

SECTIONS
{
  .vector_table :
  {
    KEEP(*(.vector_table.reset_vector));
    KEEP(*(.vector_table.exceptions));
  }
}

EXTERN skips internal symbols (a) which means that I have to be careful to
make my #[no_mangle] / #[export_name] functions and static variables both
public and reachable (i.e. no private module between them and the root of the
crate).

I don't have much problem with this issue but we have embedded crates that
provide attributes that expand into #[export_name]. This means that embedded
end users need to become aware of this "reachability" property, which is not
ideal. There's no way to enforce at compile time that an item marked with a
custom attribute is reachable so the end users can run into linker errors or,
worst, end up with a program that ignores their attribute.


With that background info out of the way,

My proposal would be to instead have #[no_mangle] / #[export_name] imply
global visibility, regardless of whether the symbol is pub or reachable.

  • Would that solve your use case? I think that if #[no_mangle] implies
    global visibility then you won't need the #[used] static because the
    compiler always codegens global symbols (?).

  • Is there an scenario where someone would want both #[no_mangle] /
    #[export_name] and internal visibility? I can't think of any such case
    (b).

There's also the third option of adding a new attribute to control visibility:
#[visibility(internal)], #[visibility(external)], etc.

Do others think this is a reasonable strategy for having #[used] affect symbol
visibility?

Some people (myself included) would find that surprising as #[used] was
designed to have the same semantics as __attribute__((used)) and that one
doesn't affect symbol visibility.


(a) LD's -u flag has the same semantics as EXTERN. Here you can see that it
doesn't work on internal symbols.

// src/main.rs
fn main() {}

// internal visibility
#[no_mangle]
#[used]
static FOO: u32 = 1;
$ cargo rustc --release -- -C link-arg=-Wl,-uFOO
$ nm target/release/hello | grep FOO
                 U FOO
// src/main.rs
fn main() {}

// external visibility
#[no_mangle]
pub static FOO: u32 = 1;
$ cargo rustc --release -- -C link-arg=-Wl,-uFOO
$ nm target/release/hello | grep FOO
000000000004e000 R FOO

(b) Internal symbols do not work for intra-Rust FFI which is the only case I
can think of.

// bar/src/lib.rs
#[no_mangle]
fn foo() {}

#[used]
static KEEP: fn() = foo;
// src/main.rs
extern crate bar;

fn main() {
    extern "Rust" {
        fn foo();
    }

    unsafe {
        foo();
    }
}

This fails to link:

$ cargo run
(..)
          /home/japaric/tmp/hello/src/main.rs:9: undefined reference to `foo'

If you tweak bar like this

// bar/src/lib.rs
#[no_mangle]
pub fn foo() {}

Then the program links.

parched

parched commented on Sep 17, 2018

@parched
Contributor

Is there an scenario where someone would want both #[no_mangle] /
#[export_name] and internal visibility?

One reason would be when using global_asm!. However, I think the better way to solve that use case is a way to pass rust mangled names to global_asm!.

japaric

japaric commented on Sep 17, 2018

@japaric
Member

One reason would be when using global_asm!

Due to parallel codegen I don't think it's guaranteed that the #[no_mangle]-ed symbol and the global_asm! symbol will end in the same object file even if they are defined in the same module / crate. If that's not the case the linker could pick up the global_asm! symbol from object file but miss the #[no_mangled] symbol leading to an "undefined reference" error.

I have had plenty of trouble with multiple codegen units and I have found that the only thing that reliably works is EXTERN and external visibility.

alexcrichton

alexcrichton commented on Sep 17, 2018

@alexcrichton
MemberAuthor

Thank for writing that up @japaric! I think I agree with you that #[used] shouldn't be used to affect symbol visibility, drawing parallels with C is a compelling use case.

I hadn't really considered it before but forcing all #[no_mangle] items to an extern linkage visibility doesn't seem like it's such a bad idea! The only other use case I can think of to add to @parched's use case is the old rust_begin_unwind symbol (or maybe rust_panic nowadays, I forget) which was intended for adding breakpoints to panics, but IIRC this doesn't really work reliably anyway.

We do have an unstable #[linkage] attribute which can explicitly control linkage (although it would likely require updates to the MIR item collector to ensure it always visits items with #[linkage]), and I think that we may just want to go that route. I like the idea of always making #[no_mangle] items have external linkage unless there's some attribute saying it should be internal (like #[visibility(private)] or #[linkage = "internal"].

japaric

japaric commented on Sep 17, 2018

@japaric
Member

I like the idea of always making #[no_mangle] items have external linkage unless there's some attribute saying it should be internal (like #[visibility(private)] or #[linkage = "internal"].

👍. Just want to add: let's apply the same logic to #[export_name]; #[no_mangle] is just sugar for #[export_name = "<item-name>"], after all.

alexcrichton

alexcrichton commented on Sep 17, 2018

@alexcrichton
MemberAuthor

Ah sorry yeah, #[export_name] I consider to be the same as #[no_mangle] as well.

retep998

retep998 commented on Sep 18, 2018

@retep998
Member

Coming from the perspective of the Windows linker model, I am absolutely in agreement that #[no_mangle] and #[export_name] should always imply external linkage.

added
A-linkageArea: linking into static, shared libraries and binaries
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
on Sep 20, 2018
japaric

japaric commented on Sep 21, 2018

@japaric
Member

I don't know if this kind of change needs a RFC or FCP but I have put up a PR that implements this in #54414.

33 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-linkageArea: linking into static, shared libraries and binariesT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @alexcrichton@pnkfelix@hsivonen@RReverser@retep998

      Issue actions

        #[used] and symbol visibility is unclear · Issue #54135 · rust-lang/rust