Skip to content

ABI for __m256 and __m512 is wrong when avx/avx512 is disabled globally, or enabled per-function #64706

Open
@chorman0773

Description

@chorman0773

Based on this discussion: https://groups.google.com/g/x86-64-abi/c/FMhl2vDl1D8

Currently, llvm passes __m256 and __m512 parameters/return values when it cannot use ymm/zmm registers as follows:

  • Parameters are passed on the stack
  • Return values are spanned accross 2-4 xmm registers.

Further, when the avx/avx512f features are enabled at the function level (not globally, using __attribute__((target))), it passes parameters/return values:

  • Paramaters are passed on the stack
  • Return values are placed in a single ymm/zmm register.

In contrast the behaviour of gcc (which is apparantly the correct behaviour in both cases) is:

When ymm/zmm registers are unavailable:

  • Parameters are passed on the stack
  • Return values in memory (return pointer in rdi)

When ymm/zmm registers are available at the function level (using __attribute__((target))), it passes and returns values as it does when the feature is available globally via a -m flag.

The difference in behaviour can be demonstrated by https://godbolt.org/z/8sYcn6654.

Based on a short discussion on the x86-64 psABI mailing list, this appears to be entirely incorrect on behalf of llvm: When returning w/o the registers available, it must return in memory as the ABI requires it to place the 2nd SSEUP eightbyte in the 3rd eightbyte of xmm0, which fails, and sends the entire value to memory. In the locally-enabled case, the registers are available, so it should be passing fully in ymm1 and returning fully in ymm0 (llvm seems to think that it is available given that it does return in ymm0).

Activity

llvmbot

llvmbot commented on Aug 15, 2023

@llvmbot
Member

@llvm/issue-subscribers-backend-x86

phoebewang

phoebewang commented on Aug 15, 2023

@phoebewang
Contributor

It's unfortunate LLVM has gap with GCC. But I'd argue passing __m256 / __m512 on targets don't support YMM/ZMM register is problematic according to current ABI. I proposed to improve the ABI to always generate YMM/ZMM registers. https://groups.google.com/g/x86-64-abi/c/vQcfj--osKs

RalfJung

RalfJung commented on Nov 13, 2023

@RalfJung
Contributor

Further, when the avx/avx512f features are enabled at the function level (not globally, using attribute((target))), it passes parameters/return values:

So the ABI is different depending on whether the target feature is enabled globally vs in the current function? Ugh, that's a pretty critical bug, isn't it?

The case where the target features are missing is somewhat odd and maybe shouldn't even be accepted, but the case where the target feature is present for the current function should obviously work.

brianosman

brianosman commented on Nov 13, 2023

@brianosman

Yes, the varying ABI is really troublesome. I dug into this with some coworkers earlier this year (for 256-bit vectors), and just recently shared our findings/thoughts here: https://www.reddit.com/r/cpp/comments/17qowl2/comment/k8j2odi/?utm_source=share&utm_medium=web2x&context=3

TL;DR: llvm and gcc behave differently, and llvm's handling of the target attribute is only useful today if you write all of your vector code inside a giant loop with locals. The ABI issue means that it's dramatically better to continue compiling multiple versions with separate TUs (and separate flags).

added a commit that references this issue on Jul 12, 2024
added a commit that references this issue on Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    ABIApplication Binary Interfacebackend:X86

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @RalfJung@chorman0773@phoebewang@EugeneZelenko@brianosman

        Issue actions

          ABI for `__m256` and `__m512` is wrong when `avx`/`avx512` is disabled globally, or enabled per-function · Issue #64706 · llvm/llvm-project