[AArch64] treat `@llvm.ssub.sat` the same as `@llvm.aarch64.neon.sqsub` #140454

folkertdev · 2025-05-18T13:51:12Z

This probably not ready to merge, but I figured it'd be easier if we can look at/comment on the code. I know eventually all commits should be squashed, but for now having multiple commits makes it easier to revert changes.

cc @davemgreen @SpencerAbson

github-actions · 2025-05-18T13:52:08Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp

folkertdev · 2025-05-18T14:54:45Z

Yeah, two tests fail in llvm/test/CodeGen/AArch64/arm64-neon-2velem.ll, I believe this is a regression

define <4 x float> @optimize_dup(<4 x float> %a, <4 x float> %b, <4 x float> %c, <4 x float> %v) {
; CHECK-LABEL: optimize_dup:
; CHECK:       // %bb.0: // %entry
; CHECK-NEXT:    fmla v0.4s, v1.4s, v3.s[3]
; CHECK-NEXT:    fmls v0.4s, v2.4s, v3.s[3]
; CHECK-NEXT:    ret
entry:
  %lane1 = shufflevector <4 x float> %v, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
  %0 = tail call <4 x float> @llvm.fma.v4f32(<4 x float> %lane1, <4 x float> %b, <4 x float> %a)
  %lane2 = shufflevector <4 x float> %v, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	%1 = fmul <4 x float> %lane2, %c
	%s = fsub <4 x float> %0, %1
  ret <4 x float> %s
}

define <4 x float> @no_optimize_dup(<4 x float> %a, <4 x float> %b, <4 x float> %c, <4 x float> %v) {
; CHECK-LABEL: no_optimize_dup:
; CHECK:       // %bb.0: // %entry
; CHECK-NEXT:    fmla v0.4s, v1.4s, v3.s[3]
; CHECK-NEXT:    fmls v0.4s, v2.4s, v3.s[1]
; CHECK-NEXT:    ret
entry:
  %lane1 = shufflevector <4 x float> %v, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
  %0 = tail call <4 x float> @llvm.fma.v4f32(<4 x float> %lane1, <4 x float> %b, <4 x float> %a)
  %lane2 = shufflevector <4 x float> %v, <4 x float> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
	%1 = fmul <4 x float> %lane2, %c
	%s = fsub <4 x float> %0, %1
  ret <4 x float> %s
}

        3738:  .type optimize_dup,@function 
        3739: optimize_dup: // @optimize_dup 
        3740:  .cfi_startproc 
        3741: // %bb.0: // %entry 
        3742:  fmul v2.4s, v2.4s, v3.s[3] 
        3743:  fmla v0.4s, v1.4s, v3.s[3] 
next:4507      !~~~~~~~~~~~~~~~~~~~~~~~~~  error: match on wrong line
        3744:  fsub v0.4s, v0.4s, v2.4s 
        3745:  ret 
        3746: .Lfunc_end298: 
        3747:  .size optimize_dup, .Lfunc_end298-optimize_dup 
        3748:  .cfi_endproc 
        3749:  // -- End function 
        3750:  .globl no_optimize_dup // -- Begin function no_optimize_dup 
        3751:  .p2align 2 
        3752:  .type no_optimize_dup,@function 
        3753: no_optimize_dup: // @no_optimize_dup 
        3754:  .cfi_startproc 
        3755: // %bb.0: // %entry 
        3756:  fmul v2.4s, v2.4s, v3.s[1] 
        3757:  fmla v0.4s, v1.4s, v3.s[3] 
next:4522      !~~~~~~~~~~~~~~~~~~~~~~~~~  error: match on wrong line
        3758:  fsub v0.4s, v0.4s, v2.4s 
        3759:  ret 
        3760: .Lfunc_end299: 
        3761:  .size no_optimize_dup, .Lfunc_end299-no_optimize_dup 
        3762:  .cfi_endproc

It looks like the test assumes that the operations are fused, and with these changes they no longer are.

Does that sound right? I guess there is some other rule with fma that we've missed?

folkertdev · 2025-05-18T16:41:14Z

hmm, apparently I messed up some flags. On CI, and with the command from the top of the file, it looks like some movs get inserted in an unexpected place. I have no idea why that would be the case.

Also a bunch of the ; CHECK-GI-NEXT: warning: lines added in the first command have disappeared when I run the tests now?!

github-actions · 2025-05-18T17:32:54Z

✅ With the latest revision this PR passed the C/C++ code formatter.

davemgreen · 2025-05-18T18:00:52Z

Also a bunch of the ; CHECK-GI-NEXT: warning: lines added in the first command have disappeared when I run the tests now?!

That sounds good - it mean that GISel is falling back to SDAG less often. If the tests are less efficient (for the CHECK-GI lines) that's OK if it is falling back less. There are a number of issues in gisel that we are still working through.

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp

SpencerAbson · 2025-05-19T07:57:34Z

Thank you for doing this, I'll take a look as soon as I can.

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp

llvm/lib/Target/AArch64/AArch64InstrFormats.td

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp

nikic · 2025-05-24T08:55:49Z

Would it be possible to entirely remove these neon intrinsics and auto-upgrade them to the generic ones?

davemgreen · 2025-05-27T16:02:59Z

Would it be possible to entirely remove these neon intrinsics and auto-upgrade them to the generic ones?

For other intrinsics (abs, min/max) that is a good way to go. For these though, the instructions technically set a Q flag on saturation that the users might try and use. That is not something we support at the moment (we don't model the sideeffects), but having the intrinsics present allows us to pivot if we need to in the future.

davemgreen · 2025-05-27T16:03:41Z

This is still a draft, but probably deserves to be a proper review at this point.

llvmbot · 2025-05-27T16:07:53Z

@llvm/pr-subscribers-backend-aarch64

Author: Folkert de Vries (folkertdev)

Changes

fixes #94463

This probably not ready to merge, but I figured it'd be easier if we can look at/comment on the code. I know eventually all commits should be squashed, but for now having multiple commits makes it easier to revert changes.

cc @davemgreen @SpencerAbson

Patch is 26.54 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/140454.diff

6 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+20)
(modified) llvm/lib/Target/AArch64/AArch64InstrFormats.td (+16-33)
(modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+8-16)
(modified) llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp (+24-13)
(modified) llvm/test/CodeGen/AArch64/arm64-neon-3vdiff.ll (+6-6)
(modified) llvm/test/CodeGen/AArch64/arm64-vmul.ll (+184)

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 293292d47dd48..0905901c5f69b 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -6235,6 +6235,26 @@ SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
           DAG.getNode(
               AArch64ISD::URSHR_I, dl, Op.getOperand(1).getValueType(), Op.getOperand(1), Op.getOperand(2)));
     return SDValue();
+  case Intrinsic::aarch64_neon_sqadd:
+    if (Op.getValueType().isVector())
+      return DAG.getNode(ISD::SADDSAT, dl, Op.getValueType(), Op.getOperand(1),
+                         Op.getOperand(2));
+    return SDValue();
+  case Intrinsic::aarch64_neon_sqsub:
+    if (Op.getValueType().isVector())
+      return DAG.getNode(ISD::SSUBSAT, dl, Op.getValueType(), Op.getOperand(1),
+                         Op.getOperand(2));
+    return SDValue();
+  case Intrinsic::aarch64_neon_uqadd:
+    if (Op.getValueType().isVector())
+      return DAG.getNode(ISD::UADDSAT, dl, Op.getValueType(), Op.getOperand(1),
+                         Op.getOperand(2));
+    return SDValue();
+  case Intrinsic::aarch64_neon_uqsub:
+    if (Op.getValueType().isVector())
+      return DAG.getNode(ISD::USUBSAT, dl, Op.getValueType(), Op.getOperand(1),
+                         Op.getOperand(2));
+    return SDValue();
   case Intrinsic::aarch64_sve_whilelt:
     return optimizeIncrementingWhile(Op.getNode(), DAG, /*IsSigned=*/true,
                                      /*IsEqual=*/false);
diff --git a/llvm/lib/Target/AArch64/AArch64InstrFormats.td b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
index 5489541fcb318..6adf84879052f 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrFormats.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
@@ -6256,24 +6256,6 @@ multiclass SIMDThreeSameVector<bit U, bits<5> opc, string asm,
          [(set (v2i64 V128:$Rd), (OpNode (v2i64 V128:$Rn), (v2i64 V128:$Rm)))]>;
 }
 
-multiclass SIMDThreeSameVectorExtraPatterns<string inst, SDPatternOperator OpNode> {
-  def : Pat<(v8i8 (OpNode V64:$LHS, V64:$RHS)),
-          (!cast<Instruction>(inst#"v8i8") V64:$LHS, V64:$RHS)>;
-  def : Pat<(v4i16 (OpNode V64:$LHS, V64:$RHS)),
-          (!cast<Instruction>(inst#"v4i16") V64:$LHS, V64:$RHS)>;
-  def : Pat<(v2i32 (OpNode V64:$LHS, V64:$RHS)),
-          (!cast<Instruction>(inst#"v2i32") V64:$LHS, V64:$RHS)>;
-
-  def : Pat<(v16i8 (OpNode V128:$LHS, V128:$RHS)),
-          (!cast<Instruction>(inst#"v16i8") V128:$LHS, V128:$RHS)>;
-  def : Pat<(v8i16 (OpNode V128:$LHS, V128:$RHS)),
-          (!cast<Instruction>(inst#"v8i16") V128:$LHS, V128:$RHS)>;
-  def : Pat<(v4i32 (OpNode V128:$LHS, V128:$RHS)),
-          (!cast<Instruction>(inst#"v4i32") V128:$LHS, V128:$RHS)>;
-  def : Pat<(v2i64 (OpNode V128:$LHS, V128:$RHS)),
-          (!cast<Instruction>(inst#"v2i64") V128:$LHS, V128:$RHS)>;
-}
-
 // As above, but D sized elements unsupported.
 multiclass SIMDThreeSameVectorBHS<bit U, bits<5> opc, string asm,
                                   SDPatternOperator OpNode> {
@@ -9861,14 +9843,15 @@ multiclass SIMDIndexedLongSD<bit U, bits<4> opc, string asm,
 }
 
 multiclass SIMDIndexedLongSQDMLXSDTied<bit U, bits<4> opc, string asm,
-                                       SDPatternOperator Accum> {
+                                       SDPatternOperator VecAcc,
+                                       SDPatternOperator ScalAcc> {
   def v4i16_indexed : BaseSIMDIndexedTied<0, U, 0, 0b01, opc,
                                       V128, V64,
                                       V128_lo, VectorIndexH,
                                       asm, ".4s", ".4s", ".4h", ".h",
     [(set (v4i32 V128:$dst),
-          (Accum (v4i32 V128:$Rd),
-                 (v4i32 (int_aarch64_neon_sqdmull
+          (VecAcc (v4i32 V128:$Rd),
+                  (v4i32 (int_aarch64_neon_sqdmull
                              (v4i16 V64:$Rn),
                              (dup_v8i16 (v8i16 V128_lo:$Rm),
                                          VectorIndexH:$idx)))))]> {
@@ -9883,8 +9866,8 @@ multiclass SIMDIndexedLongSQDMLXSDTied<bit U, bits<4> opc, string asm,
                                       V128_lo, VectorIndexH,
                                       asm#"2", ".4s", ".4s", ".8h", ".h",
     [(set (v4i32 V128:$dst),
-          (Accum (v4i32 V128:$Rd),
-                 (v4i32 (int_aarch64_neon_sqdmull
+          (VecAcc (v4i32 V128:$Rd),
+                  (v4i32 (int_aarch64_neon_sqdmull
                             (extract_high_v8i16 (v8i16 V128:$Rn)),
                             (extract_high_dup_v8i16 (v8i16 V128_lo:$Rm), VectorIndexH:$idx)))))]> {
     bits<3> idx;
@@ -9898,8 +9881,8 @@ multiclass SIMDIndexedLongSQDMLXSDTied<bit U, bits<4> opc, string asm,
                                       V128, VectorIndexS,
                                       asm, ".2d", ".2d", ".2s", ".s",
     [(set (v2i64 V128:$dst),
-        (Accum (v2i64 V128:$Rd),
-               (v2i64 (int_aarch64_neon_sqdmull
+        (VecAcc (v2i64 V128:$Rd),
+                (v2i64 (int_aarch64_neon_sqdmull
                           (v2i32 V64:$Rn),
                           (dup_v4i32 (v4i32 V128:$Rm), VectorIndexS:$idx)))))]> {
     bits<2> idx;
@@ -9912,8 +9895,8 @@ multiclass SIMDIndexedLongSQDMLXSDTied<bit U, bits<4> opc, string asm,
                                       V128, VectorIndexS,
                                       asm#"2", ".2d", ".2d", ".4s", ".s",
     [(set (v2i64 V128:$dst),
-          (Accum (v2i64 V128:$Rd),
-                 (v2i64 (int_aarch64_neon_sqdmull
+          (VecAcc (v2i64 V128:$Rd),
+                  (v2i64 (int_aarch64_neon_sqdmull
                             (extract_high_v4i32 (v4i32 V128:$Rn)),
                             (extract_high_dup_v4i32 (v4i32 V128:$Rm), VectorIndexS:$idx)))))]> {
     bits<2> idx;
@@ -9930,8 +9913,8 @@ multiclass SIMDIndexedLongSQDMLXSDTied<bit U, bits<4> opc, string asm,
     let Inst{20} = idx{0};
   }
 
-  def : Pat<(i32 (Accum (i32 FPR32Op:$Rd),
-                        (i32 (vector_extract
+  def : Pat<(i32 (ScalAcc (i32 FPR32Op:$Rd),
+                          (i32 (vector_extract
                                     (v4i32 (int_aarch64_neon_sqdmull
                                                 (v4i16 V64:$Rn),
                                                 (v4i16 V64:$Rm))),
@@ -9942,8 +9925,8 @@ multiclass SIMDIndexedLongSQDMLXSDTied<bit U, bits<4> opc, string asm,
                         (INSERT_SUBREG (IMPLICIT_DEF), V64:$Rm, dsub),
                         (i64 0))>;
 
-  def : Pat<(i32 (Accum (i32 FPR32Op:$Rd),
-                        (i32 (vector_extract
+  def : Pat<(i32 (ScalAcc (i32 FPR32Op:$Rd),
+                          (i32 (vector_extract
                                     (v4i32 (int_aarch64_neon_sqdmull
                                                 (v4i16 V64:$Rn),
                                                 (dup_v8i16 (v8i16 V128_lo:$Rm),
@@ -9959,8 +9942,8 @@ multiclass SIMDIndexedLongSQDMLXSDTied<bit U, bits<4> opc, string asm,
                                       FPR64Op, FPR32Op, V128, VectorIndexS,
                                       asm, ".s", "", "", ".s",
     [(set (i64 FPR64Op:$dst),
-          (Accum (i64 FPR64Op:$Rd),
-                 (i64 (int_aarch64_neon_sqdmulls_scalar
+          (ScalAcc (i64 FPR64Op:$Rd),
+                   (i64 (int_aarch64_neon_sqdmulls_scalar
                             (i32 FPR32Op:$Rn),
                             (i32 (vector_extract (v4i32 V128:$Rm),
                                                  VectorIndexS:$idx))))))]> {
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 010c7c391527f..9b256b2a7a878 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -5811,12 +5811,12 @@ defm SMAXP    : SIMDThreeSameVectorBHS<0,0b10100,"smaxp", int_aarch64_neon_smaxp
 defm SMAX     : SIMDThreeSameVectorBHS<0,0b01100,"smax", smax>;
 defm SMINP    : SIMDThreeSameVectorBHS<0,0b10101,"sminp", int_aarch64_neon_sminp>;
 defm SMIN     : SIMDThreeSameVectorBHS<0,0b01101,"smin", smin>;
-defm SQADD    : SIMDThreeSameVector<0,0b00001,"sqadd", int_aarch64_neon_sqadd>;
+defm SQADD    : SIMDThreeSameVector<0,0b00001,"sqadd", saddsat>;
 defm SQDMULH  : SIMDThreeSameVectorHS<0,0b10110,"sqdmulh",int_aarch64_neon_sqdmulh>;
 defm SQRDMULH : SIMDThreeSameVectorHS<1,0b10110,"sqrdmulh",int_aarch64_neon_sqrdmulh>;
 defm SQRSHL   : SIMDThreeSameVector<0,0b01011,"sqrshl", int_aarch64_neon_sqrshl>;
 defm SQSHL    : SIMDThreeSameVector<0,0b01001,"sqshl", int_aarch64_neon_sqshl>;
-defm SQSUB    : SIMDThreeSameVector<0,0b00101,"sqsub", int_aarch64_neon_sqsub>;
+defm SQSUB    : SIMDThreeSameVector<0,0b00101,"sqsub", ssubsat>;
 defm SRHADD   : SIMDThreeSameVectorBHS<0,0b00010,"srhadd", avgceils>;
 defm SRSHL    : SIMDThreeSameVector<0,0b01010,"srshl", int_aarch64_neon_srshl>;
 defm SSHL     : SIMDThreeSameVector<0,0b01000,"sshl", int_aarch64_neon_sshl>;
@@ -5830,10 +5830,10 @@ defm UMAXP    : SIMDThreeSameVectorBHS<1,0b10100,"umaxp", int_aarch64_neon_umaxp
 defm UMAX     : SIMDThreeSameVectorBHS<1,0b01100,"umax", umax>;
 defm UMINP    : SIMDThreeSameVectorBHS<1,0b10101,"uminp", int_aarch64_neon_uminp>;
 defm UMIN     : SIMDThreeSameVectorBHS<1,0b01101,"umin", umin>;
-defm UQADD    : SIMDThreeSameVector<1,0b00001,"uqadd", int_aarch64_neon_uqadd>;
+defm UQADD    : SIMDThreeSameVector<1,0b00001,"uqadd", uaddsat>;
 defm UQRSHL   : SIMDThreeSameVector<1,0b01011,"uqrshl", int_aarch64_neon_uqrshl>;
 defm UQSHL    : SIMDThreeSameVector<1,0b01001,"uqshl", int_aarch64_neon_uqshl>;
-defm UQSUB    : SIMDThreeSameVector<1,0b00101,"uqsub", int_aarch64_neon_uqsub>;
+defm UQSUB    : SIMDThreeSameVector<1,0b00101,"uqsub", usubsat>;
 defm URHADD   : SIMDThreeSameVectorBHS<1,0b00010,"urhadd", avgceilu>;
 defm URSHL    : SIMDThreeSameVector<1,0b01010,"urshl", int_aarch64_neon_urshl>;
 defm USHL     : SIMDThreeSameVector<1,0b01000,"ushl", int_aarch64_neon_ushl>;
@@ -5842,12 +5842,6 @@ defm SQRDMLAH : SIMDThreeSameVectorSQRDMLxHTiedHS<1,0b10000,"sqrdmlah",
 defm SQRDMLSH : SIMDThreeSameVectorSQRDMLxHTiedHS<1,0b10001,"sqrdmlsh",
                                                     int_aarch64_neon_sqrdmlsh>;
 
-// Extra saturate patterns, other than the intrinsics matches above
-defm : SIMDThreeSameVectorExtraPatterns<"SQADD", saddsat>;
-defm : SIMDThreeSameVectorExtraPatterns<"UQADD", uaddsat>;
-defm : SIMDThreeSameVectorExtraPatterns<"SQSUB", ssubsat>;
-defm : SIMDThreeSameVectorExtraPatterns<"UQSUB", usubsat>;
-
 defm AND : SIMDLogicalThreeVector<0, 0b00, "and", and>;
 defm BIC : SIMDLogicalThreeVector<0, 0b01, "bic",
                                   BinOpFrag<(and node:$LHS, (vnot node:$RHS))> >;
@@ -6563,10 +6557,8 @@ defm SMLAL   : SIMDLongThreeVectorTiedBHS<0, 0b1000, "smlal",
 defm SMLSL   : SIMDLongThreeVectorTiedBHS<0, 0b1010, "smlsl",
     TriOpFrag<(sub node:$LHS, (AArch64smull node:$MHS, node:$RHS))>>;
 defm SMULL   : SIMDLongThreeVectorBHS<0, 0b1100, "smull", AArch64smull>;
-defm SQDMLAL : SIMDLongThreeVectorSQDMLXTiedHS<0, 0b1001, "sqdmlal",
-                                               int_aarch64_neon_sqadd>;
-defm SQDMLSL : SIMDLongThreeVectorSQDMLXTiedHS<0, 0b1011, "sqdmlsl",
-                                               int_aarch64_neon_sqsub>;
+defm SQDMLAL : SIMDLongThreeVectorSQDMLXTiedHS<0, 0b1001, "sqdmlal", saddsat>;
+defm SQDMLSL : SIMDLongThreeVectorSQDMLXTiedHS<0, 0b1011, "sqdmlsl", ssubsat>;
 defm SQDMULL : SIMDLongThreeVectorHS<0, 0b1101, "sqdmull",
                                      int_aarch64_neon_sqdmull>;
 defm SSUBL   : SIMDLongThreeVectorBHS<0, 0b0010, "ssubl",
@@ -8125,9 +8117,9 @@ defm SMLAL : SIMDVectorIndexedLongSDTied<0, 0b0010, "smlal",
 defm SMLSL : SIMDVectorIndexedLongSDTied<0, 0b0110, "smlsl",
     TriOpFrag<(sub node:$LHS, (AArch64smull node:$MHS, node:$RHS))>>;
 defm SMULL : SIMDVectorIndexedLongSD<0, 0b1010, "smull", AArch64smull>;
-defm SQDMLAL : SIMDIndexedLongSQDMLXSDTied<0, 0b0011, "sqdmlal",
+defm SQDMLAL : SIMDIndexedLongSQDMLXSDTied<0, 0b0011, "sqdmlal", saddsat,
                                            int_aarch64_neon_sqadd>;
-defm SQDMLSL : SIMDIndexedLongSQDMLXSDTied<0, 0b0111, "sqdmlsl",
+defm SQDMLSL : SIMDIndexedLongSQDMLXSDTied<0, 0b0111, "sqdmlsl", ssubsat,
                                            int_aarch64_neon_sqsub>;
 defm SQRDMLAH : SIMDIndexedSQRDMLxHSDTied<1, 0b1101, "sqrdmlah",
                                           int_aarch64_neon_sqrdmlah>;
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 7ff2e55e802c5..fdaad067fc69d 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -1622,8 +1622,10 @@ bool AArch64LegalizerInfo::legalizeSmallCMGlobalValue(
 
 bool AArch64LegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
                                              MachineInstr &MI) const {
-  auto LowerBinOp = [&MI](unsigned Opcode) {
-    MachineIRBuilder MIB(MI);
+  MachineIRBuilder MIB(MI);
+  MachineRegisterInfo &MRI = *MIB.getMRI();
+
+  auto LowerBinOp = [&MI, &MIB](unsigned Opcode) {
     MIB.buildInstr(Opcode, {MI.getOperand(0)},
                    {MI.getOperand(2), MI.getOperand(3)});
     MI.eraseFromParent();
@@ -1642,7 +1644,6 @@ bool AArch64LegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
     MachineFunction &MF = *MI.getMF();
     auto Val = MF.getRegInfo().createGenericVirtualRegister(
         LLT::scalar(VaListSize * 8));
-    MachineIRBuilder MIB(MI);
     MIB.buildLoad(Val, MI.getOperand(2),
                   *MF.getMachineMemOperand(MachinePointerInfo(),
                                            MachineMemOperand::MOLoad,
@@ -1664,14 +1665,12 @@ bool AArch64LegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
     assert(MI.getOpcode() == TargetOpcode::G_INTRINSIC_W_SIDE_EFFECTS);
     // Anyext the value being set to 64 bit (only the bottom 8 bits are read by
     // the instruction).
-    MachineIRBuilder MIB(MI);
     auto &Value = MI.getOperand(3);
     Register ExtValueReg = MIB.buildAnyExt(LLT::scalar(64), Value).getReg(0);
     Value.setReg(ExtValueReg);
     return true;
   }
   case Intrinsic::aarch64_prefetch: {
-    MachineIRBuilder MIB(MI);
     auto &AddrVal = MI.getOperand(1);
 
     int64_t IsWrite = MI.getOperand(2).getImm();
@@ -1694,8 +1693,6 @@ bool AArch64LegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
   case Intrinsic::aarch64_neon_smaxv:
   case Intrinsic::aarch64_neon_uminv:
   case Intrinsic::aarch64_neon_sminv: {
-    MachineIRBuilder MIB(MI);
-    MachineRegisterInfo &MRI = *MIB.getMRI();
     bool IsSigned = IntrinsicID == Intrinsic::aarch64_neon_saddv ||
                     IntrinsicID == Intrinsic::aarch64_neon_smaxv ||
                     IntrinsicID == Intrinsic::aarch64_neon_sminv;
@@ -1720,8 +1717,6 @@ bool AArch64LegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
   }
   case Intrinsic::aarch64_neon_uaddlp:
   case Intrinsic::aarch64_neon_saddlp: {
-    MachineIRBuilder MIB(MI);
-
     unsigned Opc = IntrinsicID == Intrinsic::aarch64_neon_uaddlp
                        ? AArch64::G_UADDLP
                        : AArch64::G_SADDLP;
@@ -1732,9 +1727,6 @@ bool AArch64LegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
   }
   case Intrinsic::aarch64_neon_uaddlv:
   case Intrinsic::aarch64_neon_saddlv: {
-    MachineIRBuilder MIB(MI);
-    MachineRegisterInfo &MRI = *MIB.getMRI();
-
     unsigned Opc = IntrinsicID == Intrinsic::aarch64_neon_uaddlv
                        ? AArch64::G_UADDLV
                        : AArch64::G_SADDLV;
@@ -1790,11 +1782,30 @@ bool AArch64LegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
     return LowerBinOp(AArch64::G_UMULL);
   case Intrinsic::aarch64_neon_abs: {
     // Lower the intrinsic to G_ABS.
-    MachineIRBuilder MIB(MI);
     MIB.buildInstr(TargetOpcode::G_ABS, {MI.getOperand(0)}, {MI.getOperand(2)});
     MI.eraseFromParent();
     return true;
   }
+  case Intrinsic::aarch64_neon_sqadd: {
+    if (MRI.getType(MI.getOperand(0).getReg()).isVector())
+      return LowerBinOp(TargetOpcode::G_SADDSAT);
+    break;
+  }
+  case Intrinsic::aarch64_neon_sqsub: {
+    if (MRI.getType(MI.getOperand(0).getReg()).isVector())
+      return LowerBinOp(TargetOpcode::G_SSUBSAT);
+    break;
+  }
+  case Intrinsic::aarch64_neon_uqadd: {
+    if (MRI.getType(MI.getOperand(0).getReg()).isVector())
+      return LowerBinOp(TargetOpcode::G_UADDSAT);
+    break;
+  }
+  case Intrinsic::aarch64_neon_uqsub: {
+    if (MRI.getType(MI.getOperand(0).getReg()).isVector())
+      return LowerBinOp(TargetOpcode::G_USUBSAT);
+    break;
+  }
 
   case Intrinsic::vector_reverse:
     // TODO: Add support for vector_reverse
diff --git a/llvm/test/CodeGen/AArch64/arm64-neon-3vdiff.ll b/llvm/test/CodeGen/AArch64/arm64-neon-3vdiff.ll
index 9fb8e4c8fe031..bd28d13973f9c 100644
--- a/llvm/test/CodeGen/AArch64/arm64-neon-3vdiff.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-neon-3vdiff.ll
@@ -2539,16 +2539,16 @@ define <8 x i16> @cmplx_mul_combined_re_im(<8 x i16> noundef %a, i64 %scale.coer
 ; CHECK:       // %bb.0: // %entry
 ; CHECK-NEXT:    lsr x8, x0, #16
 ; CHECK-NEXT:    movi v1.2d, #0xffff0000ffff0000
-; CHECK-NEXT:    fmov d5, x0
 ; CHECK-NEXT:    rev32 v4.8h, v0.8h
 ; CHECK-NEXT:    dup v2.8h, w8
 ; CHECK-NEXT:    sqneg v3.8h, v2.8h
 ; CHECK-NEXT:    bsl v1.16b, v2.16b, v3.16b
-; CHECK-NEXT:    sqdmull v2.4s, v0.4h, v5.h[0]
-; CHECK-NEXT:    sqdmull2 v0.4s, v0.8h, v5.h[0]
-; CHECK-NEXT:    sqdmlal v2.4s, v4.4h, v1.4h
-; CHECK-NEXT:    sqdmlal2 v0.4s, v4.8h, v1.8h
-; CHECK-NEXT:    uzp2 v0.8h, v2.8h, v0.8h
+; CHECK-NEXT:    fmov d3, x0
+; CHECK-NEXT:    sqdmull v2.4s, v4.4h, v1.4h
+; CHECK-NEXT:    sqdmull2 v1.4s, v4.8h, v1.8h
+; CHECK-NEXT:    sqdmlal v2.4s, v0.4h, v3.h[0]
+; CHECK-NEXT:    sqdmlal2 v1.4s, v0.8h, v3.h[0]
+; CHECK-NEXT:    uzp2 v0.8h, v2.8h, v1.8h
 ; CHECK-NEXT:    ret
 entry:
   %scale.sroa.2.0.extract.shift23 = lshr i64 %scale.coerce, 16
diff --git a/llvm/test/CodeGen/AArch64/arm64-vmul.ll b/llvm/test/CodeGen/AArch64/arm64-vmul.ll
index 499786470d4ac..937a17ca6c1e0 100644
--- a/llvm/test/CodeGen/AArch64/arm64-vmul.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-vmul.ll
@@ -2863,3 +2863,187 @@ define <1 x i64> @test_mul_v1i64(<1 x i64> %lhs, <1 x i64> %rhs) nounwind {
   %prod = mul <1 x i64> %lhs, %rhs
   ret <1 x i64> %prod
 }
+
+define <4 x i32> @sqdmlal4s_lib(<4 x i32> %dst, <4 x i16> %v1, <4 x i16> %v2) {
+; CHECK-LABEL: sqdmlal4s_lib:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    sqdmlal.4s v0, v1, v2
+; CHECK-NEXT:    ret
+  %tmp  = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> %v1, <4 x i16> %v2)
+  %sum = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> %dst, <4 x i32> %tmp)
+  ret <4 x i32> %sum
+}
+
+define <2 x i64> @sqdmlal2d_lib(<2 x i64> %dst, <2 x i32> %v1, <2 x i32> %v2) {
+; CHECK-LABEL: sqdmlal2d_lib:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    sqdmlal.2d v0, v1, v2
+; CHECK-NEXT:    ret
+  %tmp  = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %v1, <2 x i32> %v2)
+  %sum = call <2 x i64> @llvm.sadd.sat.v2i64(<2 x i64> %dst, <2 x i64> %tmp)
+  ret <2 x i64> %sum
+}
+
+define <4 x i32> @sqdmlal2_4s_lib(<4 x i32> %dst, <8 x i16> %v1, <8 x i16> %v2) {
+; CHECK-LABEL: sqdmlal2_4s_lib:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    sqdmlal2.4s v0, v1, v2
+; CHECK-NEXT:    ret
+  %tmp0 = shufflevector <8 x i16> %v1, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+  %tmp1 = shufflevector <8 x i16> %v2, <8 x i16> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+  %tmp2  = call <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16> %tmp0, <4 x i16> %tmp1)
+  %sum = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> %dst, <4 x i32> %tmp2)
+  ret <4 x i32> %sum
+}
+
+define <2 x i64> @sqdmlal2_2d_lib(<2 x i64> %dst, <4 x i32> %v1, <4 x i32> %v2) {
+; CHECK-LABEL: sqdmlal2_2d_lib:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    sqdmlal2.2d v0, v1, v2
+; CHECK-NEXT:    ret
+  %tmp0 = shufflevector <4 x i32> %v1, <4 x i32> poison, <2 x i32> <i32 2, i32 3>
+  %tmp1 = shufflevector <4 x i32> %v2, <4 x i32> poison, <2 x i32> <i32 2, i32 3>
+  %tmp2  = call <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32> %tmp0, <2 x i...
[truncated]

folkertdev · 2025-05-27T16:09:10Z

I marked it as read. there is currently the request for a comment above, otherwise I think all previous comments have been handled.

davemgreen

Thanks - this LGTM if there are no other comments from @SpencerAbson.

nikic · 2025-05-28T10:28:06Z

Would it be possible to entirely remove these neon intrinsics and auto-upgrade them to the generic ones?

For other intrinsics (abs, min/max) that is a good way to go. For these though, the instructions technically set a Q flag on saturation that the users might try and use. That is not something we support at the moment (we don't model the sideeffects), but having the intrinsics present allows us to pivot if we need to in the future.

Hm, if you wanted to model the Q flag, I'd assume you would change the intrinsic to return something like { <4 x i32>, i1 } instead?

That seems like a reasonable motivation for a separate intrinsic -- but for the case where the i1 result is unused, wouldn't you still want to use the target-independent ones anyway?

davemgreen · 2025-05-28T21:02:48Z

With the way that the intrinsics are defined in https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=qadd, they are stand-alone and the q flag would be a global side effect. I agree that if the dependency between the intrinsic and the flag is known then they would want to modelled as data dependencies.

I'm not against removing the intrinsics TBH. We just haven't done so so far because we might want to reverse the decision and model Q flags conservatively, as gcc does. We can always re-add them if necessary. The neon.abs and neon.smin/smax/umin/max are more obviously OK to remove (providing the scalar types do not get worse).

SpencerAbson

Thanks, minor comment but LGTM.

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp

llvm/lib/Target/AArch64/AArch64InstrFormats.td

folkertdev · 2025-05-30T21:18:12Z

I think this PR is ready to merge?

Thanks for all the help here btw!

github-actions · 2025-05-31T10:26:48Z

@folkertdev Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

Add basic handling for v1i64 saddsat, ssubsat, uaddsat and usubsat. We missed that these were not upgrading in llvm#140454 due to a lack of test coverage. Fixes llvm#142323

Add basic handling for v1i64 saddsat, ssubsat, uaddsat and usubsat. We missed that these were not upgrading in #140454 due to a lack of test coverage, and for some reason the generic v1i64 nodes were not being treated as legal like they should. Fixes #142323

Add basic handling for v1i64 saddsat, ssubsat, uaddsat and usubsat. We missed that these were not upgrading in llvm#140454 due to a lack of test coverage, and for some reason the generic v1i64 nodes were not being treated as legal like they should. Fixes llvm#142323

…b` (llvm#140454) Fixes llvm#94463 Co-authored-by: Spencer Abson <[email protected]>

Add basic handling for v1i64 saddsat, ssubsat, uaddsat and usubsat. We missed that these were not upgrading in llvm#140454 due to a lack of test coverage, and for some reason the generic v1i64 nodes were not being treated as legal like they should. Fixes llvm#142323

folkertdev force-pushed the neon-optimize-generic-saturating-intrinsics branch from d95389d to ef9b896 Compare May 18, 2025 14:01

folkertdev commented May 18, 2025

View reviewed changes

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp Outdated Show resolved Hide resolved

davemgreen requested review from davemgreen and SpencerAbson May 18, 2025 14:11

davemgreen reviewed May 18, 2025

View reviewed changes

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp Outdated Show resolved Hide resolved

SpencerAbson reviewed May 21, 2025

View reviewed changes

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AArch64/AArch64InstrFormats.td Show resolved Hide resolved

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp Outdated Show resolved Hide resolved

nikic changed the title ~~[LLVM] treat @llvm.ssub.sat the same as @llvm.aarch64.neon.sqsub~~ [AArch64] treat @llvm.ssub.sat the same as @llvm.aarch64.neon.sqsub May 24, 2025

folkertdev marked this pull request as ready for review May 27, 2025 16:07

llvmbot added the backend:AArch64 label May 27, 2025

davemgreen approved these changes May 27, 2025

View reviewed changes

SpencerAbson approved these changes May 29, 2025

View reviewed changes

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AArch64/AArch64InstrFormats.td Show resolved Hide resolved

[AArch64] Lower saturating add/sub intrinsics to generic ISD nodes

f5b04f3

folkertdev force-pushed the neon-optimize-generic-saturating-intrinsics branch from 849468d to f5b04f3 Compare May 29, 2025 13:27

davemgreen merged commit 3a98934 into llvm:main May 31, 2025
11 checks passed

folkertdev mentioned this pull request Jun 1, 2025

bad codegen for sqsub on aarch64 #142323

Closed

davemgreen mentioned this pull request Jun 2, 2025

[AArch64] Add v1i64 addsat/subsat #142342

Merged

folkertdev mentioned this pull request Jun 2, 2025

use simd_saturating_{add, sub} on neon rust-lang/stdarch#1575

Open

DhruvSrivastavaX pushed a commit to DhruvSrivastavaX/lldb-for-aix that referenced this pull request Jun 12, 2025

[AArch64] Treat @llvm.ssub.sat the same as `@llvm.aarch64.neon.sqsu…

0d199ad

…b` (llvm#140454) Fixes llvm#94463 Co-authored-by: Spencer Abson <[email protected]>

[AArch64] treat @llvm.ssub.sat the same as @llvm.aarch64.neon.sqsub #140454

[AArch64] treat @llvm.ssub.sat the same as @llvm.aarch64.neon.sqsub #140454

Uh oh!

Conversation

folkertdev commented May 18, 2025

Uh oh!

github-actions bot commented May 18, 2025

Uh oh!

Uh oh!

folkertdev commented May 18, 2025

Uh oh!

folkertdev commented May 18, 2025

Uh oh!

github-actions bot commented May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davemgreen commented May 18, 2025

Uh oh!

Uh oh!

Uh oh!

SpencerAbson commented May 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nikic commented May 24, 2025

Uh oh!

davemgreen commented May 27, 2025

Uh oh!

davemgreen commented May 27, 2025

Uh oh!

llvmbot commented May 27, 2025

Uh oh!

folkertdev commented May 27, 2025

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

nikic commented May 28, 2025

Uh oh!

davemgreen commented May 28, 2025

Uh oh!

SpencerAbson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

folkertdev commented May 30, 2025

Uh oh!

Uh oh!

github-actions bot commented May 31, 2025

Uh oh!

Uh oh!

[AArch64] treat `@llvm.ssub.sat` the same as `@llvm.aarch64.neon.sqsub` #140454

[AArch64] treat `@llvm.ssub.sat` the same as `@llvm.aarch64.neon.sqsub` #140454

github-actions bot commented May 18, 2025 •

edited

Loading