blake2module.c needs to be compiled with libhacl SIMD flags #130213

New issue

Closed

blake2module.c needs to be compiled with libhacl SIMD flags#130213

Labels

buildextension-modulestype-bug

jmroot

Contributor

Bug report

Bug description:

blake2module.c may include _hacl/Hacl_Hash_Blake2b_Simd256.h and/or _hacl/Hacl_Hash_Blake2s_Simd128.h, and thus needs to compile with LIBHACL_SIMD128_FLAGS and/or LIBHACL_SIMD256_FLAGS, or you get an error like this if the compiler doesn't enable those SIMD features by default:

In file included from ./Modules/blake2module.c:139:
In file included from ./Modules/_hacl/Hacl_Hash_Blake2b_Simd256.h:42:
In file included from ./Modules/_hacl/libintvector.h:28:
/Library/Developer/CommandLineTools/usr/bin/../lib/clang/7.0.2/include/smmintrin.h:28:2: error: "SSE4.1 instruction set not enabled"

I would create a PR but unfortunately I can't figure out where this needs to be added.

CPython versions tested on:

3.14

Operating systems tested on:

macOS

Linked PRs

added

added

Member

Aren't we setting them in configure? In addition we have:

// Small mismatch between the variable names Python defines as part of configure
// at the ones HACL* expects to be set in order to enable those headers.
#define HACL_CAN_COMPILE_VEC128 HACL_CAN_COMPILE_SIMD128
#define HACL_CAN_COMPILE_VEC256 HACL_CAN_COMPILE_SIMD256

#include "_hacl/Hacl_Hash_Blake2b.h"
#include "_hacl/Hacl_Hash_Blake2s.h"
#if HACL_CAN_COMPILE_SIMD256
#include "_hacl/Hacl_Hash_Blake2b_Simd256.h"
#endif
#if HACL_CAN_COMPILE_SIMD128
#include "_hacl/Hacl_Hash_Blake2s_Simd128.h"
#endif

So, the headers shouldn't even be included if HACL_* are not defined

picnixz

Member

In addition, -mavx2 should be sufficient to enable SSE 4.1, at least on gcc. I don't know whether this is the case on clang because I don't know how to check if -mavx2 implies -msse4.1:

if test "$ac_sys_system" != "Linux-android" -a "$ac_sys_system" != "WASI" || test "$ANDROID_API_LEVEL" -ge 28; then
  AX_CHECK_COMPILE_FLAG([-mavx2],[
    [LIBHACL_SIMD256_FLAGS="-mavx2"]
    AC_DEFINE([HACL_CAN_COMPILE_SIMD256], [1], [HACL* library can compile SIMD256 implementations])

picnixz

Member

Oh, I think I see what happens:

Modules/blake2module.o: $(srcdir)/Modules/blake2module.c $(MODULE__BLAKE2_DEPS) $(MODULE_DEPS_SHARED) $(PYTHON_HEADERS); $(CC)  -I$(srcdir)/Modules/_hacl/include $(PY_STDMODULE_CFLAGS) $(CCSHARED) -c $(srcdir)/Modules/blake2module.c -o Modules/blake2module.o

We might be missing some flag here indeed. OTOH, we have

_blake2 blake2module.c -I$(srcdir)/Modules/_hacl/include Modules/_hacl/libHacl_Hash_Blake2.a

in Setup.stdlib, but maybe we're missing a -D_BSD_SOURCE -D_DEFAULT_SOURCE here as SHA and MD5 modules are compiled with this.

picnixz

Member

I won't have time to investigate more so I leave it to @msprotz

msprotz

Contributor

@jmroot I'm slightly confused by your report. It looks like merely including smmintrin.h triggers an error if you do not pass -msse4.1, as opposed to actually using definitions from smmintrin.h

@jmroot I dimly recall something like that happening on old clang versions. Can you check with a very recent clang please?

@picnixz blake2module.c should not be compiled with any -m flags, otherwise, the compiler might start introducing, say, SSE4.1 instructions in a section of code that is not guarded behind the run-time check for e.g. has_simd256(), which will generate illegal instruction errors if Python is executed on a machine that does not have these instructions.

The idea is that one should be able to include various *mmintrin.h headers without issues, but must guard their usage behind suitable runtime checks.

jmroot

ContributorAuthor

@jmroot I'm slightly confused by your report. It looks like merely including smmintrin.h triggers an error if you do not pass -msse4.1, as opposed to actually using definitions from smmintrin.h

Yes, that seems to be the case.

@jmroot I dimly recall something like that happening on old clang versions. Can you check with a very recent clang please?

It works fine with current clang of course, because -msse4.1 is on by default.

jmroot

ContributorAuthor

The trigger for the error in smmintrin.h is #ifndef __SSE4_1__, so guarding the include with #ifdef __SSE4_1__ should be a partial solution.

msprotz

Contributor

On my system (which definitely does not have avx512):

jonathan@absinthe:/tmp $ cat test.c
#include <immintrin.h>
jonathan@absinthe:/tmp $ cc -c test.c
jonathan@absinthe:/tmp $ echo $?
0

the header inclusion works just fine. But I try to use the header without the right -m flags:

jonathan@absinthe:/tmp $ cat -p test.c
#include <immintrin.h>

int main() {
  __m512 res, src, a, b;
  __mmask16  k = 0x5555;

  res = _mm512_mask_add_ps(src, k, a, b);
  return 0;
}
jonathan@absinthe:/tmp $ cc test.c
test.c:7:9: error: always_inline function '_mm512_mask_add_ps' requires target feature 'avx512f', but would be inlined into function 'main' that is compiled without support for 'avx512f'
    7 |   res = _mm512_mask_add_ps(src, k, a, b);
      |         ^
test.c:7:9: error: AVX vector argument of type '__m512' (vector of 16 'float' values) without 'avx512f' enabled changes the ABI
2 errors generated.

Then I do get an actual error. So it appears on my system, you're allowed to include headers for intrinsics that are not currently enabled via suitable -m flags, but you do get an error if you try to use such intrinsics.

On your system, it appears that the behavior of the headers is much stricter.

What you suggest would work, but I would like to make sure I 100% understand why it's needed before I add it.

Do you agree that there is a change of behavior for intrinsic headers across clang versions?
Is compiling with clang7 a supported scenario for Python?

Thanks!

jmroot

ContributorAuthor

Do you agree that there is a change of behavior for intrinsic headers across clang versions?

Slightly hard to tell given that with current clang from Xcode 16.2

% /usr/bin/clang -E -dM -x c-header /dev/null | grep -F SSE4
#define __SSE4_1__ 1

But it's certainly consistent with the evidence.

Is compiling with clang7 a supported scenario for Python?

I don't know what the policy is. The only requirement I'm aware of is C11. Disabling SIMD for older clang would also be an acceptable though less preferable resolution.

jmroot

ContributorAuthor

The other part of the problem is that one of the types defined in the intrinsics headers is always used (__m256i, __m128i). Even when the includes are not guarded in libintvector.h, immintrin.h will internally not include avxintrin.h if __AVX__ is not defined. Simply adding the typedef results in a link-time error: LLVM ERROR: Do not know how to split this operator's operand!

Disabling SIMD may well be the only easy fix here. What do you think would be the best way to do that? Perhaps a configure check for whether smmintrin.h can be included without -msse4.1?

msprotz

Contributor

This an omission on my end and on #130157 I mention that in the absence of intrinsics headers these types ought to be defined as void*.

I'm open to guarding the include for compatibility with old versions of clang. @picnixz thoughts?

msprotz

Contributor

@jmroot I re-read your message above. Just to recap (correct me if I missed something):

if the toolchain can compile a "VEC128" implementation, then blake2module.c includes the header for Blake2s_128
this does not mean that the code from blake2module.c will run on a machine that supports VEC128 -- just that blake2module.c might elect, if support is found at runtime, to dispatch to the VEC128 implementation
corollary: blake2module.c cannot be compiled with e.g. -mavx2 as it would insert avx2 instructions in code that is not guaranteed to execute on a machine that supports AVX2
the vec128 implementations have function signatures that talk about the __m128i type
on old toolchains, the headers that contain __m128i and corresponding intrinsics give a hard error if cc is not running with the right -m flags
on old toolchains, the headers might not even want to define the right types, which causes an issue with typedef __m128i LibIntVector_vec128 for instance

Fixes include:

your suggestion: disabling SIMD if these headers cannot be included without passing the right -m flags -- with a test at configure-time; @jmroot can you tell which version of clang started behaving "the right way"? if it's only super old clang versions, then I guess that's ok?
big hack: because we know that blake2module.c only ever manipulates __m128i * (and not __m128), tweak libintvector.h to use a macro instead of a typedef, and locally define __m128 as void so that all usages of the vector behind a pointer translate to void * in the context of blake2module.c -- this seems like a terrible idea that might break in undebuggable ways
big expensive refactoring: make sure all the modules in blake2 do not #include <libintvector.h>in their public headers, relying on forward struct definitions to hide the usage of vector types from the perspective of client files

Option 3. will take considerable time and effort, and option 2 is a big hack. I lean towards your proposed solution (option 1).

@jmroot do you know which version of clang it is that those headers started behaving "the right way"?

msprotz

mentioned this

on Feb 19, 2025

Fix long-standing issue with C Abstract Structs FStarLang/karamel#542

25 remaining items

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

buildextension-modulestype-bug

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

blake2module.c needs to be compiled with libhacl SIMD flags #130213

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

25 remaining items

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Participants

blake2module.c needs to be compiled with libhacl SIMD flags #130213

Description

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

Activity

picnixz commented on Feb 17, 2025

picnixz commented on Feb 17, 2025

picnixz commented on Feb 17, 2025

picnixz commented on Feb 17, 2025

msprotz commented on Feb 17, 2025

jmroot commented on Feb 17, 2025

jmroot commented on Feb 17, 2025

msprotz commented on Feb 17, 2025

jmroot commented on Feb 17, 2025

jmroot commented on Feb 18, 2025

msprotz commented on Feb 18, 2025

msprotz commented on Feb 19, 2025

25 remaining items

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Participants

Issue actions