Description
Based on this discussion: https://groups.google.com/g/x86-64-abi/c/FMhl2vDl1D8
Currently, llvm passes __m256
and __m512
parameters/return values when it cannot use ymm/zmm registers as follows:
- Parameters are passed on the stack
- Return values are spanned accross 2-4
xmm
registers.
Further, when the avx/avx512f features are enabled at the function level (not globally, using __attribute__((target))
), it passes parameters/return values:
- Paramaters are passed on the stack
- Return values are placed in a single
ymm
/zmm
register.
In contrast the behaviour of gcc (which is apparantly the correct behaviour in both cases) is:
When ymm/zmm registers are unavailable:
- Parameters are passed on the stack
- Return values in memory (return pointer in rdi)
When ymm/zmm registers are available at the function level (using __attribute__((target))
), it passes and returns values as it does when the feature is available globally via a -m
flag.
The difference in behaviour can be demonstrated by https://godbolt.org/z/8sYcn6654.
Based on a short discussion on the x86-64 psABI mailing list, this appears to be entirely incorrect on behalf of llvm: When returning w/o the registers available, it must return in memory as the ABI requires it to place the 2nd SSEUP eightbyte in the 3rd eightbyte of xmm0
, which fails, and sends the entire value to memory. In the locally-enabled case, the registers are available, so it should be passing fully in ymm1
and returning fully in ymm0
(llvm seems to think that it is available given that it does return in ymm0
).
Activity
llvmbot commentedon Aug 15, 2023
@llvm/issue-subscribers-backend-x86
phoebewang commentedon Aug 15, 2023
It's unfortunate LLVM has gap with GCC. But I'd argue passing
__m256
/__m512
on targets don't support YMM/ZMM register is problematic according to current ABI. I proposed to improve the ABI to always generate YMM/ZMM registers. https://groups.google.com/g/x86-64-abi/c/vQcfj--osKsextern "C"
ABI of SIMD vector types depends on target features (tracking issue forabi_unsupported_vector_types
future-incompatibility lint) rust-lang/rust#116558RalfJung commentedon Nov 13, 2023
So the ABI is different depending on whether the target feature is enabled globally vs in the current function? Ugh, that's a pretty critical bug, isn't it?
The case where the target features are missing is somewhat odd and maybe shouldn't even be accepted, but the case where the target feature is present for the current function should obviously work.
brianosman commentedon Nov 13, 2023
Yes, the varying ABI is really troublesome. I dug into this with some coworkers earlier this year (for 256-bit vectors), and just recently shared our findings/thoughts here: https://www.reddit.com/r/cpp/comments/17qowl2/comment/k8j2odi/?utm_source=share&utm_medium=web2x&context=3
TL;DR: llvm and gcc behave differently, and llvm's handling of the target attribute is only useful today if you write all of your vector code inside a giant loop with locals. The ABI issue means that it's dramatically better to continue compiling multiple versions with separate TUs (and separate flags).
Split out skcms_sources into multiple GN targets.
Meta: Globally disable -Wpsabi
Meta: Globally disable -Wpsabi
Meta: Globally disable -Wpsabi
Meta: Globally disable -Wpsabi