Skip to content

Potentially unsound uses of simd_select_bitmask in stdarch #137942

Closed
@RalfJung

Description

@RalfJung
Member

Looking for potential violations of simd_* intrinsic preconditions, I found this in stdarch:

/// Compute dot-product of BF16 (16-bit) floating-point pairs in a and b,
/// accumulating the intermediate single-precision (32-bit) floating-point elements
/// with elements in src, and store the results in dst using zeromask k
/// (elements are zeroed out when the corresponding mask bit is not set).
/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=1769,1651,1654,1657,1660&avx512techs=AVX512_BF16&text=_mm_maskz_dpbf16_ps)
#[inline]
#[target_feature(enable = "avx512bf16,avx512vl")]
#[unstable(feature = "stdarch_x86_avx512", issue = "111137")]
#[cfg_attr(test, assert_instr("vdpbf16ps"))]
pub fn _mm_maskz_dpbf16_ps(k: __mmask8, src: __m128, a: __m128bh, b: __m128bh) -> __m128 {
    unsafe {
        let rst = _mm_dpbf16_ps(src, a, b).as_f32x4();
        let zero = _mm_set1_ps(0.0_f32).as_f32x4();
        transmute(simd_select_bitmask(k, rst, zero))
    }
}

simd_select_bitmask is documented to require that all the "extra"/"padding" bits in the mask (not corresponding to a vector element) must be 0. Here, rst and zero are vectors of length 4, and the mask k is a u8, meaning there are 4 bits in k that must be 0. However, nothing in the function actually ensures that.

I don't know the intended behavior of the intrinsic for that case (probably intel promises to just ignore the extra bits?), but this function recently got marked as safe (in rust-lang/stdarch#1714) and that is clearly in contradiction with our intrinsic docs. I assume the safety is correct as probably the intrinsic should have no precondition; in that case we have to

  • either explicitly mask out the higher bits
  • or figure out if we can remove the UB from simd_select_bitmask

Cc @usamoi @Amanieu @workingjubilee

Activity

added
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on Mar 3, 2025
RalfJung

RalfJung commented on Mar 3, 2025

@RalfJung
MemberAuthor

This is a similar case:

/// For each packed 32-bit integer maps the value to the number of logical 1 bits.
///
/// Uses the writemask in k - elements are zeroed in the result if the corresponding mask bit is not set.
/// Otherwise the computation result is written into the result.
///
/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_maskz_popcnt_epi32)
#[inline]
#[target_feature(enable = "avx512vpopcntdq,avx512vl")]
#[unstable(feature = "stdarch_x86_avx512", issue = "111137")]
#[cfg_attr(test, assert_instr(vpopcntd))]
pub fn _mm_maskz_popcnt_epi32(k: __mmask8, a: __m128i) -> __m128i {
    unsafe {
        transmute(simd_select_bitmask(
            k,
            simd_ctpop(a.as_i32x4()),
            i32x4::ZERO,
        ))
    }
}

I did not do an exhaustive search.

Amanieu

Amanieu commented on Mar 3, 2025

@Amanieu
Member

__mmask8 is type alias for u8 where each bit represents one vector element.

(misread your comment)

Amanieu

Amanieu commented on Mar 3, 2025

@Amanieu
Member

Right, I believe the intent is to ignore the unused bits here.

RalfJung

RalfJung commented on Mar 3, 2025

@RalfJung
MemberAuthor

Right, I believe the intent is to ignore the unused bits here.

A quick look at Intel's docs confirms this.

So, the question is, can we change the implementation to use k & 0xF to mask out the higher bits (assuming I got the bit order right)? Will LLVM know that it can contract simd_select_bitmask(k & 0xF, ...) into a single instruction on x86 based on how that architecture behaves?

Or do we have to dig into the simd_select_bitmask implementation and see if we can remove the UB? If I understand correctly what our LLVM backend does, it truncs the i8 to an i4 and then bitcasts that to <4 x i1>, so it does indeed ignore the other bits. But that also means it is likely the bitwise-and followed by trunc would get optimized to the Right Thing by LLVM.

jhorstmann

jhorstmann commented on Mar 3, 2025

@jhorstmann
Contributor

That is also my understanding of the llvm implementation. It will truncate to an integer with the number of bits corresponding to the number of lanes, then bitcast that to a vector:

let m_im = bx.trunc(mask, im);

So any higher bits will be ignored.

RalfJung

RalfJung commented on Mar 3, 2025

@RalfJung
MemberAuthor

I wonder if our other backends work the same; Cc @bjorn3 @GuillaumeGomez

bjorn3

bjorn3 commented on Mar 3, 2025

@bjorn3
Member
RalfJung

RalfJung commented on Mar 3, 2025

@RalfJung
MemberAuthor

Okay so that is also fine with arbitrary data in the "extra" bits.

bjorn3

bjorn3 commented on Mar 3, 2025

@bjorn3
Member

Yeah, extra bits are ignored.

added
I-unsoundIssue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness
T-libsRelevant to the library team, which will review and decide on the PR/issue.
E-needs-investigationCall for partcipation: This issues needs some investigation to determine current status
on Mar 4, 2025
added
I-prioritizeIssue: Indicates that prioritization has been requested for this issue.
on Mar 4, 2025

17 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCategory: This is a bug.E-needs-investigationCall for partcipation: This issues needs some investigation to determine current statusI-unsoundIssue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/SoundnessP-highHigh priorityT-libsRelevant to the library team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @Amanieu@RalfJung@jhorstmann@bjorn3@lolbinarycat

      Issue actions

        Potentially unsound uses of simd_select_bitmask in stdarch · Issue #137942 · rust-lang/rust