Skip to content

Semantics of MIR function calls #71117

Open
@RalfJung

Description

@RalfJung
Member

Some discussion at #71005 (comment) revealed that we are not entirely sure what exactly the semantics of passing arguments and return values for function calls should be.

The simplest possible semantics is to say that when a stack frame is created, we allocate fresh memory for all arguments and return values (according to the known layout, determined by the callee). We copy the function arguments into the argument slots. Then we evaluate the function, and when it returns, we copy the return value back.

However, such a model is hard to compile down to destination-passing style, where the callee actually writes its return value directly into caller-provided memory. If that aliases with other things the function can access, behavior could differ with and without destination-passing style. This is complicated by the fact that in MIR right now a Call does not provide a return place, but even with destination-passing style diverging functions (without a return place) may access their return local _0 . Moreover @eddyb says that also for some function arguments, we might want to elide the copy during codegen; it is unclear whether that is behaviorally equivalent to the above copying semantics or not.

This is something of a sibling to #68364. We should have a good way to collect all these "MIR semantics" issues...

Activity

RalfJung

RalfJung commented on Apr 14, 2020

@RalfJung
MemberAuthor

I think a first step we should take is to make the Call terminator always provide a return place. Right now, every backend has to replicate the same hack where some scratch memory still needs to be allocated for the return place in case the caller did not provide some (except for Miri which makes it illegal to access the return place in this situation, but that is likely just wrong).

Beyond that, I am not sure. Miri right now implements copying as described above for arguments. For return values, it directly uses the caller-provided place, which means RETURN_PLACE needs to be special-cased in a bunch of places. We can probably get rid of this special treatment if we are okay with losing the "immediate value" optimization for return places; then we could force_allocate the caller-provided return place and make the callee _0 an Indirect local. (This would entirely remove return_place from Frame, which is good I think.)

To ensure that the return place does not alias with anything, we could try using Stacked Borrows: rust-lang/miri#1330. However, hat is quite the hack -- usually retags are explicitly in the code; this would make the return place the only implicit retag in our semantics. Also we should at least go over a bunch of tricky examples to ensure that this indeed makes all bad cases UB. Unfortunately, without a solution to rust-lang/miri#196, it is hard to test these things.

For passing arguments without a copy, I don't know if what Miri does is a problem and I don't know what a solution could look like.

eddyb

eddyb commented on Apr 14, 2020

@eddyb
Member

cc @rust-lang/wg-mir-opt @nikomatsakis

added
A-MIRArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html
A-miriArea: The miri tool
T-langRelevant to the language team
on Apr 14, 2020
nikomatsakis

nikomatsakis commented on Apr 14, 2020

@nikomatsakis
Contributor

@RalfJung is the optional place only employed for functions that return uninhabited values? The type ! has size 0, but I suppose in some cases the type might be non-zero in size..? I'm a bit confused about that part of what you wrote.

bjorn3

bjorn3 commented on Apr 14, 2020

@bjorn3
Member

(!, u8) has a size of 1.

eddyb

eddyb commented on Apr 14, 2020

@eddyb
Member

Because of partial initialization, you could have fields of e.g. (A, B, C, !) written to.

nikomatsakis

nikomatsakis commented on Apr 14, 2020

@nikomatsakis
Contributor

Right. I just wanted to be sure that this is the kind of "dummy place" that @RalfJung was referring to, or if this was also a problem for the ! type.

nikomatsakis

nikomatsakis commented on Apr 14, 2020

@nikomatsakis
Contributor

I think that in general when we are assigning to a place, that place should not alias any of the values being read during the instruction. In other words, I think we should avoid the need for backends to introduce "temporaries".

I'm not 100% sure what this implies for arguments. I forget if we permit the arguments in MIR to be mutable, or do we have a function with a mut parameter copy those values into a local copy?

This is all related to #68304, since in there we are talking about cases where the size of the parameter is not known at runtime, and we would like to be able to pass it as argument by reference without having to create a temporary (which would require an alloca). Presumably this is at least partly what @eddyb was referring to.

eddyb

eddyb commented on Apr 14, 2020

@eddyb
Member

@nikomatsakis For optimization reasons we want calls to not do copies of anything we pass by reference in the ABI. Otherwise MIR optimizations could never remove those copies, even when it would be correct to do so.

nikomatsakis

nikomatsakis commented on Apr 14, 2020

@nikomatsakis
Contributor

Right, that'd be the other case. Still, it looks if I compile

fn foo(mut x: u32) {
    x += 1;
}

I get this:

fn  foo(_1: u32) -> () {
    debug x => _1;                       // in scope 0 at src/main.rs:1:8: 1:13
    let mut _0: ();                      // return place in scope 0 at src/main.rs:1:20: 1:20
    let mut _2: (u32, bool);             // in scope 0 at src/main.rs:2:5: 2:11

    bb0: {
        _2 = CheckedAdd(_1, const 1u32); // bb0[0]: scope 0 at src/main.rs:2:5: 2:11
        assert(!move (_2.1: bool), "attempt to add with overflow") -> bb1; // bb0[1]: scope 0 at src/main.rs:2:5: 2:11
    }

    bb1: {
        _1 = move (_2.0: u32);           // bb1[0]: scope 0 at src/main.rs:2:5: 2:11
        return;                          // bb1[1]: scope 0 at src/main.rs:3:2: 3:2
    }
}

Note in particular the _1 = move _2 at the end I think that if parameters were (at least sometimes) references into the caller's stack frame, that could be problematic, right? (In other words, we don't want the callee to be mutating the caller's variables.)

eddyb

eddyb commented on Apr 14, 2020

@eddyb
Member

(In other words, we don't want the callee to be mutating the caller's variables.)

We do, again, for optimizations reasons. This only happens with Operand::Move arguments, Operand::Copy arguments will do a copy in the caller before the call IIRC.

hanna-kruppe

hanna-kruppe commented on Apr 14, 2020

@hanna-kruppe
Contributor

To clarify, do you mean that e.g. in

fn foo(s: String) {
    bar(s);
}

the call to bar should pass on the same address foo received as argument? This is not currently the case, but IIUC it falls out of the aspiration / codegen strategy that you describe.

Edit: to be clear, the reason it currently copies the String in foo is an explicit temporary in the MIR, whose use as Operand::Move in the call to bar the indeed happens without a further temporary that would have been implicit in the MIR. But presumably you'd want that temporary to be removed too?

70 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-MIRArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlA-miriArea: The miri toolT-langRelevant to the language team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @eddyb@nikomatsakis@RalfJung@Diggsey@jonas-schievink

        Issue actions

          Semantics of MIR function calls · Issue #71117 · rust-lang/rust