Closed
Description
Rust currently doesn't pass vectors of floats by vector register.
This should be able to be passed by vector registers:
pub struct Stats { x: f32, y: f32, z: f32, q: f32 }
pub fn sum_rust(a: &Stats, b: &Stats) -> Stats {
return Stats {x: a.x + b.x, y: a.y + b.y, z: a.z + b.z, q: a.q + b.q };
}
But in Rust 1.47 it uses the stack:
example::sum_rust:
mov rax, rdi
movups xmm0, xmmword ptr [rsi]
movups xmm1, xmmword ptr [rdx]
addps xmm1, xmm0
movups xmmword ptr [rdi], xmm1
ret
Post 1.47, it is packed into integer registers (see this issue: #85265):
example::sum_rust:
movss xmm0, dword ptr [rdi]
movss xmm1, dword ptr [rdi + 4]
addss xmm0, dword ptr [rsi]
addss xmm1, dword ptr [rsi + 4]
movsd xmm2, qword ptr [rdi + 8]
movsd xmm3, qword ptr [rsi + 8]
addps xmm3, xmm2
movd eax, xmm0
movd ecx, xmm1
movd esi, xmm3
shufps xmm3, xmm3, 229
movd edx, xmm3
shl rdx, 32
or rdx, rsi
shl rcx, 32
or rax, rcx
ret
This issue should be fixed by #93405 and should bring it back to pre 1.48.
But ideally it should be optimized to:
example::sum_rust:
addps xmm1, xmm0
ret
@dotdash mentions in #85265 that this is due to Rust not using the proper types on the LLVM IR level: #85265 (comment)
EDIT:
Clang is able to use this optimization in a similar case:
struct Foo
{
float bar1;
float bar2;
float bar3;
float bar4;
};
Foo sum_cpp(Foo foo1, Foo foo2)
{
Foo foo3;
foo3.bar1 = foo1.bar1 + foo2.bar1;
foo3.bar2 = foo1.bar2 + foo2.bar2;
foo3.bar3 = foo1.bar3 + foo2.bar3;
foo3.bar4 = foo1.bar4 + foo2.bar4;
return foo3;
}
Gets turned into:
sum_cpp(Foo, Foo): # @sum_cpp(Foo, Foo)
addps xmm0, xmm2
addps xmm1, xmm3
ret
Metadata
Metadata
Assignees
Labels
No labels
Activity
nikic commentedon Jan 30, 2022
This doesn't make sense for integers. Yes, you've picked out a very lucky example where this allows vectorization, but generally this just means that data will have to be moved back and forth between vector and GPR registers, for the majority case where no vectorization is possible. Passing these by pointer (as in 1.47) is the right thing to do.
Urgau commentedon Jan 30, 2022
I actually tried this but this is complicated because this register are influenced by
#[target_feature]
, which could lead to unsound code when called from a code that doesn't use this register.https://github.com/rust-lang/rust/blob/803e19c6ebc647d4e600967c255fccea838bce9f/compiler/rustc_middle/src/ty/layout.rs#L3197-L3221
Miksel12 commentedon Jan 30, 2022
That makes sense, but the exact same happens with floats. It is beneficial for floats to use vector registers, isn't it?
In that case I should probably use floats for my example.
I checked Clang to see how C++ is optimized and integers are indeed not passed in vector registers. Floats are.
Gets turned into:
Miksel12 commentedon Jan 30, 2022
It seems like that issue is discussed in: #79865. It also seems like that issue will be fixed by upgrading to LLVM 14: #79865 (comment)
Though I'm not entirely sure it is the same issue.
Urgau commentedon Jan 30, 2022
This is because clang generate a more optimized layout
dso_local { <2 x float>, <2 x float> }
instead of%Foo = type { float, float, float, float }
. This is something I also tried to do is my PR but this way more complicated (it also required the abi compatibility to be fixed) so I decided to not include it.Urgau commentedon Feb 1, 2022
@Miksel12 I've open #93564 to fix the general issue related to the aggregation of types and I manage to also fix this issue.
With my PR your example code would now be compiled to:
Even better than clang !
EDIT: It's no longer the case, due to the abi + target_features unsoundness.
scottmcm commentedon Feb 6, 2022
Note that if you add
repr(simd)
then it does pass deal in vector types: https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=1645fca36d467d49bc9bd4f299f8243fSo arguably this more about "
repr(rust)
should use vector representations sometimes". (And of course when we eventually getstdsimd
then people could do this as arepr(transparent)
wrapper aroundSimd<f32, 4>
.)bjorn3 commentedon Feb 6, 2022
It doesn't. It passes them by reference both when using
&Stats
as arguments and when usingStats
as arguments (llvm ir is identical). There would be no load or store if it was passed in vector registers.workingjubilee commentedon Feb 13, 2022
Having read this issue, I believe this is functionally a duplicate of #64609, #85265, and #91447, so I am closing this.