Skip to content

Implement all ARM NEON intrinsics #148

@gnzlbg

Description

@gnzlbg
Contributor

Steps for implementing an intrinsic:

  • Select an intrinsic below
  • Review coresimd/arm/neon.rs and coresimd/aarch64/neon.rs
  • Consult ARM official documentation about your intrinsic
  • Consult godbolt for how the intrinsic should be codegen'd, using clang as an example. Use the links below and replace the name of the intrinsic in the code with your intrinsic. Note that if ARM is an error then your intrinsic may be AArch64-only
  • If the codegen is the same on ARM/AArch64, place the intrinsic in coresimd/arm/neon.rs. If it's different place it in both with appropriate #[cfg] in coresimd/arm/neon.rs. If it's only AArch64 place it in coresimd/aarch64/neon.rs
  • Write a test for your intrinsic at the bottom of the file as well
  • Test! Probably use rustup run nightly sh ci/run-docker.sh aarch64-unknown-linux-gnu.
  • When ready, send a PR!

All unimplemented NEON intrinsics

Activity

gnzlbg

gnzlbg commented on Oct 24, 2017

@gnzlbg
Author
oconnor663

oconnor663 commented on Nov 15, 2018

@oconnor663
Contributor

Is there a blocker for these, or is it just finding time to do it? I'd like to help, but I'd need a more experienced compiler/SIMD person to point me in the right direction.

gnzlbg

gnzlbg commented on Nov 15, 2018

@gnzlbg
ContributorAuthor

I can mentor. Start by taking a look at some of the intrinsics in the coresimd/aarch64/neon.rs module :)

oconnor663

oconnor663 commented on Nov 16, 2018

@oconnor663
Contributor

Is there some upstream source that these all get copied from, or are they actually written by hand?

gnzlbg

gnzlbg commented on Nov 16, 2018

@gnzlbg
ContributorAuthor

I am not sure I understand the question ? The neon modules in this repository are written by hand, although @Amanieu has expressed interest into generating some parts of them automatically.

oconnor663

oconnor663 commented on Nov 16, 2018

@oconnor663
Contributor
gnzlbg

gnzlbg commented on Nov 16, 2018

@gnzlbg
ContributorAuthor

Ah, I see, that would be the ARM NEON spec: https://developer.arm.com/technologies/neon/intrinsics

alexcrichton

alexcrichton commented on Dec 20, 2018

@alexcrichton
Member

Now might be a great time to help make some more progress on this! We've got tons of intrinsics already implemented (thanks @gnzlbg!), and I've just implemented automatic verification of all added intrinsics, so we know if they're added they've got the correct signature at least!

I've updated the OP of this issue with more detailed instructions about how to bind NEON intrinsics. Hopefully it's not too bad any more!

We'll probably want to reorganize modules so they're a bit smaller and more manageable over time, but for now if anyone's interested to add more intrinsics and needs some help let me know!

25 remaining items

SparrowLii

SparrowLii commented on Oct 21, 2021

@SparrowLii
Member
CryZe

CryZe commented on Oct 21, 2021

@CryZe
Contributor

Welp, I'll mark them again then. Somehow the GitHub Pull Request UI doesn't show them as diffs at all: https://i.imgur.com/BsHR5in.gif

SparrowLii

SparrowLii commented on Oct 21, 2021

@SparrowLii
Member

Github’s comparison tool will always have problems when changing a large amount of code XD

SparrowLii

SparrowLii commented on Oct 21, 2021

@SparrowLii
Member

As in #1230, except for the following instructions and those use 16-bit floating-point, other instructions have been implemented:

  1. The following instructions are only available in aarch64 now, because the corresponding target_feature cannot be found in the available features of arm:
    vcadd_rotvcmlavdot

  2. The feature i8mm is not valid:
    vmmlavusmmla: https://rust.godbolt.org/z/8GbKW5ef4

  3. LLVM ERROR(Can be reproduced in godbolt):
    vsm4e: https://rust.godbolt.org/z/xhT1xvGTP

  4. LLVM ERROR(Normal in gotbolt, but LLVM ERROR: Cannot select: intrinsic raises at runtime)
    vsudotvusdot: https://rust.godbolt.org/z/aMnEvab3n
    vqshlu: https://rust.godbolt.org/z/hvGhrhdMT

  5. Not implmented in LLVM and cannot be implemented manually:
    vmull_p64(for arm)、vsm3vrax1q_u64vxarq_u64vrnd32vrnd64vsha512

Amanieu

Amanieu commented on Oct 21, 2021

@Amanieu
Member

As in #1230, except for the following instructions and those use 16-bit floating-point, other instructions have been implemented:

1. The following instructions are only available in aarch64 now, because the corresponding `target_feature` cannot be found in the available features of arm:
   `vcadd_rot`、`vcmla`、`vdot`

On LLVM's ARM backend, vcadd_rot and vcmla are under the v8.3a feature. vdot is under the dotprod feature. I got this information from llvm-project/llvm/lib/Target/ARM/ARMInstrNEON.td.

2. The feature `i8mm` is not valid:
   `vmmla`、`vusmmla`: [rust.godbolt.org/z/8GbKW5ef4](https://rust.godbolt.org/z/8GbKW5ef4)

Already discussed in rust-lang/rust#90079.

3. LLVM ERROR(Can be reproduced in godbolt):
   `vsm4e`: [rust.godbolt.org/z/xhT1xvGTP](https://rust.godbolt.org/z/xhT1xvGTP)

Use llvm.aarch64.crypto.sm4ekey instead of llvm.aarch64.sve.sm4ekey.

4. LLVM ERROR(Normal in gotbolt, but `LLVM ERROR: Cannot select: intrinsic` raises at runtime)
   `vsudot`、`vusdot`: [rust.godbolt.org/z/aMnEvab3n](https://rust.godbolt.org/z/aMnEvab3n)
   `vqshlu`: [rust.godbolt.org/z/hvGhrhdMT](https://rust.godbolt.org/z/hvGhrhdMT)

You need to make you test function pub in godbolt, otherwise it will be optimized away as unreachable by rustc before LLVM.

vsudot/vusdot require the i8mm target feature. vqshlu seems to work fine in godbolt after changing the pub.

5. Not implmented in LLVM and cannot be implemented manually:
   `vmull_p64`(for arm)、`vsm3`、`vrax1q_u64`、`vxarq_u64`、`vrnd32`、`vrnd64`、`vsha512`

These all seem to exist in LLVM at least for AArch64. For ARM we can just leave these out for now.

SparrowLii

SparrowLii commented on Oct 25, 2021

@SparrowLii
Member

Hope someone can help implement the remaining instructions.

SparrowLii

SparrowLii commented on Nov 9, 2021

@SparrowLii
Member

@Amanieu v8.5a feature is non-runtime detected so we can't use #[simd_test(enable = "neon,v8.5a")]. So how do we add tests for instructions that use v8.5a, like vrnd32x and vrnd64x?

hkratz

hkratz commented on Nov 9, 2021

@hkratz
Contributor

@SparrowLii Shouldn't that work with the frintts feature?

SparrowLii

SparrowLii commented on Nov 9, 2021

@SparrowLii
Member

@SparrowLii Shouldn't that work with the frintts feature?

Looks useful: https://rust.godbolt.org/z/894W8cndG

Amanieu

Amanieu commented on Nov 9, 2021

@Amanieu
Member

LLVM only supports frintts on AArch64, so it's fine to not support this intrinsic on ARM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @alexcrichton@valpackett@aloucks@Amanieu@oconnor663

        Issue actions

          Implement all ARM NEON intrinsics · Issue #148 · rust-lang/stdarch