Skip to content

ABI for __m256 and __m512 is wrong when avx/avx512 is disabled globally, or enabled per-function #64706

Open
@chorman0773

Description

@chorman0773

Based on this discussion: https://groups.google.com/g/x86-64-abi/c/FMhl2vDl1D8

Currently, llvm passes __m256 and __m512 parameters/return values when it cannot use ymm/zmm registers as follows:

  • Parameters are passed on the stack
  • Return values are spanned accross 2-4 xmm registers.

Further, when the avx/avx512f features are enabled at the function level (not globally, using __attribute__((target))), it passes parameters/return values:

  • Paramaters are passed on the stack
  • Return values are placed in a single ymm/zmm register.

In contrast the behaviour of gcc (which is apparantly the correct behaviour in both cases) is:

When ymm/zmm registers are unavailable:

  • Parameters are passed on the stack
  • Return values in memory (return pointer in rdi)

When ymm/zmm registers are available at the function level (using __attribute__((target))), it passes and returns values as it does when the feature is available globally via a -m flag.

The difference in behaviour can be demonstrated by https://godbolt.org/z/8sYcn6654.

Based on a short discussion on the x86-64 psABI mailing list, this appears to be entirely incorrect on behalf of llvm: When returning w/o the registers available, it must return in memory as the ABI requires it to place the 2nd SSEUP eightbyte in the 3rd eightbyte of xmm0, which fails, and sends the entire value to memory. In the locally-enabled case, the registers are available, so it should be passing fully in ymm1 and returning fully in ymm0 (llvm seems to think that it is available given that it does return in ymm0).

Metadata

Metadata

Assignees

No one assigned

    Labels

    ABIApplication Binary Interfacebackend:X86

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions