Open
Description
Leaves poorly optimized assembly in its wake.
use std::arch::x86::*;
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;
// Creates non inlined calls to intrinsics
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
#[target_feature(enable = "avx2")]
pub unsafe fn foo(input: &[__m256]) -> f32 {
let accum = |val: __m256| {
let roll = _mm256_setr_epi32(1, 2, 3, 4, 5, 6, 7, 0);
let mut sum = val;
let mut tmp = _mm256_permutevar8x32_ps(val, roll);
for i in 0..7 {
sum = _mm256_add_ps(tmp, sum);
tmp = _mm256_permutevar8x32_ps(tmp, roll);
}
sum
};
// Once we call a complex internal closure or fn multiple
// times, we find that the compiler hasn't told them that
// they can inline or use avx2 intrinsics. Not the sharpest.
let sum1 = accum(input[0]);
let sum2 = accum(input[1]);
_mm256_cvtss_f32(sum1) + _mm256_cvtss_f32(sum2)
}
// Works as expected
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
#[target_feature(enable = "avx2")]
pub unsafe fn bar(input: &[__m256]) -> f32 {
// When we pull this tool out of the shed every thing works
#[target_feature(enable = "avx2")]
unsafe fn accum(val: __m256) -> __m256 {
let roll = _mm256_setr_epi32(1, 2, 3, 4, 5, 6, 7, 0);
let mut sum = val;
let mut tmp = _mm256_permutevar8x32_ps(val, roll);
for i in 0..7 {
sum = _mm256_add_ps(tmp, sum);
tmp = _mm256_permutevar8x32_ps(tmp, roll);
}
sum
}
let sum1 = accum(input[0]);
let sum2 = accum(input[1]);
_mm256_cvtss_f32(sum1) + _mm256_cvtss_f32(sum2)
}
https://rust.godbolt.org/z/cIr7qS
I found this bug by triggering this one with closures. I wasn't able to trigger it from godbolt. I'm using the latest stable, so if I copied the code in it would work (as in not work).
#50154
Making a separate issue since this one is a performance bug.
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
[-]target_feature doesn't trickle down to closures internal fns[/-][+]target_feature doesn't trickle down to closures andinternal fns[/+][-]target_feature doesn't trickle down to closures andinternal fns[/-][+]target_feature doesn't trickle down to closures and internal fns[/+]scottmcm commentedon Feb 8, 2019
I think this is expected for internal fns, as items in general don't get anything from their enclosures (most well known is that they don't get access to generic parameters).
I think I agree for closures, though.
hanna-kruppe commentedon Feb 9, 2019
One problem with closures is that functions with
target_feature
have to be unsafe, but there's no such thing as unsafe closures.est31 commentedon Mar 9, 2023
I think this has been implemented in the meantime for the target_feature 1.1 feature, as otherwise #108655 wouldn't have been filed.
LeSeulArtichaut commentedon Mar 13, 2023
(See #73631)
est31 commentedon Mar 13, 2023
@LeSeulArtichaut maybe we can close this in favour of #73631?
LeSeulArtichaut commentedon Mar 14, 2023
#73631 doesn't discuss inner functions at all, but if it seems uncontroversial to not allow them on inner functions then I guess this issue can be closed.
est31 commentedon Mar 15, 2023
Yeah, applying target_feature to inner functions is not good. There are ways to call inner functions from outside the function. It's different for closures.
RalfJung commentedon Nov 23, 2024
So, can this be closed then? Closures have been handled by #58279, and inner functions generally don't inherit things from the outer function so it'd be surprising if they inherited target_feature.
EDIT: But the inheriting behavior only gets enabled with target_feature 1.1.
KyleSiefring commentedon Nov 23, 2024
The current implementation is still a bit of a blemish. I wonder if a subset of potential "target_feature" options are reasonable to pass down to inner functions. I would say no, since this could make things even more confusing.
Maybe a new issue should be created?
RalfJung commentedon Nov 24, 2024
AFAIK basically no attributes are inherited by inner functions. Why should
target_feature
be special?