Skip to content

Rust should pass vectors by vector register #93490

Closed
@Miksel12

Description

@Miksel12

Rust currently doesn't pass vectors of floats by vector register.

This should be able to be passed by vector registers:

pub struct Stats { x: f32, y: f32, z: f32, q: f32  }


pub fn sum_rust(a: &Stats, b: &Stats) -> Stats {
    return Stats {x: a.x + b.x, y: a.y + b.y, z: a.z + b.z, q: a.q + b.q };
}

But in Rust 1.47 it uses the stack:

example::sum_rust:
        mov     rax, rdi
        movups  xmm0, xmmword ptr [rsi]
        movups  xmm1, xmmword ptr [rdx]
        addps   xmm1, xmm0
        movups  xmmword ptr [rdi], xmm1
        ret

Post 1.47, it is packed into integer registers (see this issue: #85265):

example::sum_rust:
        movss   xmm0, dword ptr [rdi]
        movss   xmm1, dword ptr [rdi + 4]
        addss   xmm0, dword ptr [rsi]
        addss   xmm1, dword ptr [rsi + 4]
        movsd   xmm2, qword ptr [rdi + 8]
        movsd   xmm3, qword ptr [rsi + 8]
        addps   xmm3, xmm2
        movd    eax, xmm0
        movd    ecx, xmm1
        movd    esi, xmm3
        shufps  xmm3, xmm3, 229
        movd    edx, xmm3
        shl     rdx, 32
        or      rdx, rsi
        shl     rcx, 32
        or      rax, rcx
        ret

This issue should be fixed by #93405 and should bring it back to pre 1.48.
But ideally it should be optimized to:

example::sum_rust:
        addps   xmm1, xmm0
        ret

@dotdash mentions in #85265 that this is due to Rust not using the proper types on the LLVM IR level: #85265 (comment)

EDIT:
Clang is able to use this optimization in a similar case:

struct Foo
{
    float bar1;
    float bar2;
    float bar3;
    float bar4;
};

Foo sum_cpp(Foo foo1, Foo foo2)
{
    Foo foo3;
    foo3.bar1 = foo1.bar1 + foo2.bar1;
    foo3.bar2 = foo1.bar2 + foo2.bar2;
    foo3.bar3 = foo1.bar3 + foo2.bar3;
    foo3.bar4 = foo1.bar4 + foo2.bar4;
    return foo3;
}

Gets turned into:

sum_cpp(Foo, Foo):                       # @sum_cpp(Foo, Foo)
        addps   xmm0, xmm2
        addps   xmm1, xmm3
        ret

Activity

nikic

nikic commented on Jan 30, 2022

@nikic
Contributor

This doesn't make sense for integers. Yes, you've picked out a very lucky example where this allows vectorization, but generally this just means that data will have to be moved back and forth between vector and GPR registers, for the majority case where no vectorization is possible. Passing these by pointer (as in 1.47) is the right thing to do.

Urgau

Urgau commented on Jan 30, 2022

@Urgau
Member

I actually tried this but this is complicated because this register are influenced by #[target_feature], which could lead to unsound code when called from a code that doesn't use this register.
https://github.com/rust-lang/rust/blob/803e19c6ebc647d4e600967c255fccea838bce9f/compiler/rustc_middle/src/ty/layout.rs#L3197-L3221

Miksel12

Miksel12 commented on Jan 30, 2022

@Miksel12
Author

This doesn't make sense for integers. Yes, you've picked out a very lucky example where this allows vectorization, but generally this just means that data will have to be moved back and forth between vector and GPR registers, for the majority case where no vectorization is possible. Passing these by pointer (as in 1.47) is the right thing to do.

That makes sense, but the exact same happens with floats. It is beneficial for floats to use vector registers, isn't it?
In that case I should probably use floats for my example.
I checked Clang to see how C++ is optimized and integers are indeed not passed in vector registers. Floats are.

struct Foo
{
    float bar1;
    float bar2;
    float bar3;
    float bar4;
};

Foo sum_cpp(Foo foo1, Foo foo2)
{
    Foo foo3;
    foo3.bar1 = foo1.bar1 + foo2.bar1;
    foo3.bar2 = foo1.bar2 + foo2.bar2;
    foo3.bar3 = foo1.bar3 + foo2.bar3;
    foo3.bar4 = foo1.bar4 + foo2.bar4;
    return foo3;
}

Gets turned into:

sum_cpp(Foo, Foo):                       # @sum_cpp(Foo, Foo)
        addps   xmm0, xmm2
        addps   xmm1, xmm3
        ret
Miksel12

Miksel12 commented on Jan 30, 2022

@Miksel12
Author

I actually tried this but this is complicated because this register are influenced by #[target_feature], which could lead to unsound code when called from a code that doesn't use this register.

https://github.com/rust-lang/rust/blob/803e19c6ebc647d4e600967c255fccea838bce9f/compiler/rustc_middle/src/ty/layout.rs#L3197-L3221

It seems like that issue is discussed in: #79865. It also seems like that issue will be fixed by upgrading to LLVM 14: #79865 (comment)
Though I'm not entirely sure it is the same issue.

Urgau

Urgau commented on Jan 30, 2022

@Urgau
Member

I checked Clang to see how C++ is optimized and integers are indeed not passed in vector registers. Floats are.

This is because clang generate a more optimized layout dso_local { <2 x float>, <2 x float> } instead of %Foo = type { float, float, float, float }. This is something I also tried to do is my PR but this way more complicated (it also required the abi compatibility to be fixed) so I decided to not include it.

Urgau

Urgau commented on Feb 1, 2022

@Urgau
Member

@Miksel12 I've open #93564 to fix the general issue related to the aggregation of types and I manage to also fix this issue.

With my PR your example code would now be compiled to:

sum_rust:
        addps	xmm0, xmm1
	ret

Even better than clang !

EDIT: It's no longer the case, due to the abi + target_features unsoundness.

scottmcm

scottmcm commented on Feb 6, 2022

@scottmcm
Member

Rust should pass vectors by vector register

Note that if you add repr(simd) then it does pass deal in vector types: https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=1645fca36d467d49bc9bd4f299f8243f

So arguably this more about "repr(rust) should use vector representations sometimes". (And of course when we eventually get stdsimd then people could do this as a repr(transparent) wrapper around Simd<f32, 4>.)

bjorn3

bjorn3 commented on Feb 6, 2022

@bjorn3
Member

Note that if you add repr(simd) then it does pass deal in vector types: https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=1645fca36d467d49bc9bd4f299f8243f

It doesn't. It passes them by reference both when using &Stats as arguments and when using Stats as arguments (llvm ir is identical). There would be no load or store if it was passed in vector registers.

define void @_ZN10playground8sum_rust17hf4e374897bed05a9E(<4 x float>* noalias nocapture sret(<4 x float>) dereferenceable(16) %0, <4 x float>* noalias nocapture readonly align 16 dereferenceable(16) %a, <4 x float>* noalias nocapture readonly align 16 dereferenceable(16) %b) unnamed_addr #0 {
start:
  %1 = load <4 x float>, <4 x float>* %a, align 16
  %2 = load <4 x float>, <4 x float>* %b, align 16
  %3 = fadd <4 x float> %1, %2
  store <4 x float> %3, <4 x float>* %0, align 16
  ret void
}
workingjubilee

workingjubilee commented on Feb 13, 2022

@workingjubilee
Member

Having read this issue, I believe this is functionally a duplicate of #64609, #85265, and #91447, so I am closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @nikic@Urgau@bjorn3@scottmcm@workingjubilee

      Issue actions

        Rust should pass vectors by vector register · Issue #93490 · rust-lang/rust