[LV] Use getFixedValue instead of getKnownMinValue when appropriate #143526

david-arm · 2025-06-10T12:57:42Z

There are many places in VPlan and LoopVectorize where we use getKnownMinValue to discover the number of elements in a vector. Where we expect the vector to have a fixed length, I have used the stronger getFixedValue call. I believe this is clearer and adds extra protection in the form of an assert in getFixedValue that the vector is not scalable.

While looking at VPFirstOrderRecurrencePHIRecipe::computeCost I also took the liberty of simplifying the code.

In theory I believe this patch should be NFC, but I'm reluctant to add that to the title in case we're just missing tests for some of the VPlan changes. I built and ran the LLVM test suite when targeting neoverse-v1 and it seemed ok.

llvmbot · 2025-06-10T12:58:17Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-vectorizers

Author: David Sherwood (david-arm)

Changes

There are many places in VPlan and LoopVectorize where we use getKnownMinValue to discover the number of elements in a vector. Where we expect the vector to have a fixed length, I have used the stronger getFixedValue call. I believe this is clearer and adds extra protection in the form of an assert in getFixedValue that the vector is not scalable.

While looking at VPFirstOrderRecurrencePHIRecipe::computeCost I also took the liberty of simplifying the code.

In theory I believe this patch should be NFC, but I'm reluctant to add that to the title in case we're just missing tests for some of the VPlan changes. I built and ran the LLVM test suite when targeting neoverse-v1 and it seemed ok.

Full diff: https://github.com/llvm/llvm-project/pull/143526.diff

3 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+19-15)
(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+3-4)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+8-10)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 333e50ee98418..8e2d22e5db23d 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -3116,12 +3116,13 @@ LoopVectorizationCostModel::getDivRemSpeculationCost(Instruction *I,
     // that we will create. This cost is likely to be zero. The phi node
     // cost, if any, should be scaled by the block probability because it
     // models a copy at the end of each predicated block.
-    ScalarizationCost += VF.getKnownMinValue() *
-      TTI.getCFInstrCost(Instruction::PHI, CostKind);
+    ScalarizationCost +=
+        VF.getFixedValue() * TTI.getCFInstrCost(Instruction::PHI, CostKind);
 
     // The cost of the non-predicated instruction.
-    ScalarizationCost += VF.getKnownMinValue() *
-      TTI.getArithmeticInstrCost(I->getOpcode(), I->getType(), CostKind);
+    ScalarizationCost +=
+        VF.getFixedValue() *
+        TTI.getArithmeticInstrCost(I->getOpcode(), I->getType(), CostKind);
 
     // The cost of insertelement and extractelement instructions needed for
     // scalarization.
@@ -4290,7 +4291,7 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
           return NumLegalParts <= VF.getKnownMinValue();
         }
         // Two or more elements that share a register - are vectorized.
-        return NumLegalParts < VF.getKnownMinValue();
+        return NumLegalParts < VF.getFixedValue();
       };
 
       // If no def nor is a store, e.g., branches, continue - no value to check.
@@ -4575,8 +4576,8 @@ VectorizationFactor LoopVectorizationPlanner::selectEpilogueVectorizationFactor(
         assert(!isa<SCEVCouldNotCompute>(TC) &&
                "Trip count SCEV must be computable");
         RemainingIterations = SE.getURemExpr(
-            TC, SE.getConstant(TCType, MainLoopVF.getKnownMinValue() * IC));
-        MaxTripCount = MainLoopVF.getKnownMinValue() * IC - 1;
+            TC, SE.getConstant(TCType, MainLoopVF.getFixedValue() * IC));
+        MaxTripCount = MainLoopVF.getFixedValue() * IC - 1;
         if (SE.isKnownPredicate(CmpInst::ICMP_ULT, RemainingIterations,
                                 SE.getConstant(TCType, MaxTripCount))) {
           MaxTripCount =
@@ -4587,7 +4588,7 @@ VectorizationFactor LoopVectorizationPlanner::selectEpilogueVectorizationFactor(
       }
       if (SE.isKnownPredicate(
               CmpInst::ICMP_UGT,
-              SE.getConstant(TCType, NextVF.Width.getKnownMinValue()),
+              SE.getConstant(TCType, NextVF.Width.getFixedValue()),
               RemainingIterations))
         continue;
     }
@@ -5258,14 +5259,14 @@ LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I,
 
   // Get the cost of the scalar memory instruction and address computation.
   InstructionCost Cost =
-      VF.getKnownMinValue() * TTI.getAddressComputationCost(PtrTy, SE, PtrSCEV);
+      VF.getFixedValue() * TTI.getAddressComputationCost(PtrTy, SE, PtrSCEV);
 
   // Don't pass *I here, since it is scalar but will actually be part of a
   // vectorized loop where the user of it is a vectorized instruction.
   const Align Alignment = getLoadStoreAlignment(I);
-  Cost += VF.getKnownMinValue() * TTI.getMemoryOpCost(I->getOpcode(),
-                                                      ValTy->getScalarType(),
-                                                      Alignment, AS, CostKind);
+  Cost += VF.getFixedValue() * TTI.getMemoryOpCost(I->getOpcode(),
+                                                   ValTy->getScalarType(),
+                                                   Alignment, AS, CostKind);
 
   // Get the overhead of the extractelement and insertelement instructions
   // we might create due to scalarization.
@@ -5281,7 +5282,7 @@ LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I,
     auto *VecI1Ty =
         VectorType::get(IntegerType::getInt1Ty(ValTy->getContext()), VF);
     Cost += TTI.getScalarizationOverhead(
-        VecI1Ty, APInt::getAllOnes(VF.getKnownMinValue()),
+        VecI1Ty, APInt::getAllOnes(VF.getFixedValue()),
         /*Insert=*/false, /*Extract=*/true, CostKind);
     Cost += TTI.getCFInstrCost(Instruction::Br, CostKind);
 
@@ -5327,7 +5328,6 @@ InstructionCost
 LoopVectorizationCostModel::getUniformMemOpCost(Instruction *I,
                                                 ElementCount VF) {
   assert(Legal->isUniformMemOp(*I, VF));
-
   Type *ValTy = getLoadStoreType(I);
   auto *VectorTy = cast<VectorType>(toVectorTy(ValTy, VF));
   const Align Alignment = getLoadStoreAlignment(I);
@@ -5342,6 +5342,10 @@ LoopVectorizationCostModel::getUniformMemOpCost(Instruction *I,
   StoreInst *SI = cast<StoreInst>(I);
 
   bool IsLoopInvariantStoreValue = Legal->isInvariant(SI->getValueOperand());
+  // TODO: We have tests that request the cost of extracting element
+  // VF.getKnownMinValue() - 1 from a scalable vector. This is actually
+  // meaningless, given what we actually want is the last lane and is likely
+  // to be more expensive.
   return TTI.getAddressComputationCost(ValTy) +
          TTI.getMemoryOpCost(Instruction::Store, ValTy, Alignment, AS,
                              CostKind) +
@@ -5624,7 +5628,7 @@ LoopVectorizationCostModel::getScalarizationOverhead(Instruction *I,
 
     for (Type *VectorTy : getContainedTypes(RetTy)) {
       Cost += TTI.getScalarizationOverhead(
-          cast<VectorType>(VectorTy), APInt::getAllOnes(VF.getKnownMinValue()),
+          cast<VectorType>(VectorTy), APInt::getAllOnes(VF.getFixedValue()),
           /*Insert=*/true,
           /*Extract=*/false, CostKind);
     }
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 1838562f26b82..b5fa98d341756 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -331,7 +331,7 @@ Value *VPTransformState::get(const VPValue *Def, bool NeedsScalar) {
 
   bool IsSingleScalar = vputils::isSingleScalar(Def);
 
-  VPLane LastLane(IsSingleScalar ? 0 : VF.getKnownMinValue() - 1);
+  VPLane LastLane(IsSingleScalar ? 0 : VF.getFixedValue() - 1);
   // Check if there is a scalar value for the selected lane.
   if (!hasScalarValue(Def, LastLane)) {
     // At the moment, VPWidenIntOrFpInductionRecipes, VPScalarIVStepsRecipes and
@@ -368,7 +368,7 @@ Value *VPTransformState::get(const VPValue *Def, bool NeedsScalar) {
     assert(!VF.isScalable() && "VF is assumed to be non scalable.");
     Value *Undef = PoisonValue::get(toVectorizedTy(LastInst->getType(), VF));
     set(Def, Undef);
-    for (unsigned Lane = 0; Lane < VF.getKnownMinValue(); ++Lane)
+    for (unsigned Lane = 0; Lane < VF.getFixedValue(); ++Lane)
       packScalarIntoVectorizedValue(Def, Lane);
     VectorValue = get(Def);
   }
@@ -789,8 +789,7 @@ void VPRegionBlock::execute(VPTransformState *State) {
   ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
       Entry);
   State->Lane = VPLane(0);
-  for (unsigned Lane = 0, VF = State->VF.getKnownMinValue(); Lane < VF;
-       ++Lane) {
+  for (unsigned Lane = 0, VF = State->VF.getFixedValue(); Lane < VF; ++Lane) {
     State->Lane = VPLane(Lane, VPLane::Kind::First);
     // Visit the VPBlocks connected to \p this, starting from it.
     for (VPBlockBase *Block : RPOT) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 62b99d98a2b5e..509d91d6c492b 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -872,7 +872,7 @@ void VPInstruction::execute(VPTransformState &State) {
                                     isVectorToScalar() || isSingleScalar());
   bool GeneratesPerAllLanes = doesGeneratePerAllLanes();
   if (GeneratesPerAllLanes) {
-    for (unsigned Lane = 0, NumLanes = State.VF.getKnownMinValue();
+    for (unsigned Lane = 0, NumLanes = State.VF.getFixedValue();
          Lane != NumLanes; ++Lane) {
       Value *GeneratedValue = generatePerLane(State, VPLane(Lane));
       assert(GeneratedValue && "generatePerLane must produce a value");
@@ -2792,8 +2792,7 @@ void VPReplicateRecipe::execute(VPTransformState &State) {
   }
 
   // Generate scalar instances for all VF lanes.
-  assert(!State.VF.isScalable() && "Can't scalarize a scalable vector");
-  const unsigned EndLane = State.VF.getKnownMinValue();
+  const unsigned EndLane = State.VF.getFixedValue();
   for (unsigned Lane = 0; Lane < EndLane; ++Lane)
     scalarizeInstruction(UI, this, VPLane(Lane), State);
 }
@@ -2846,7 +2845,7 @@ InstructionCost VPReplicateRecipe::computeCost(ElementCount VF,
                UI->getOpcode(), ResultTy, CostKind,
                {TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None},
                Op2Info, Operands, UI, &Ctx.TLI) *
-           (isSingleScalar() ? 1 : VF.getKnownMinValue());
+           (isSingleScalar() ? 1 : VF.getFixedValue());
   }
   }
 
@@ -3407,7 +3406,7 @@ void VPInterleaveRecipe::execute(VPTransformState &State) {
     Value *ResBlockInMask = State.get(BlockInMask);
     Value *ShuffledMask = State.Builder.CreateShuffleVector(
         ResBlockInMask,
-        createReplicatedMask(InterleaveFactor, State.VF.getKnownMinValue()),
+        createReplicatedMask(InterleaveFactor, State.VF.getFixedValue()),
         "interleaved.mask");
     return MaskForGaps ? State.Builder.CreateBinOp(Instruction::And,
                                                    ShuffledMask, MaskForGaps)
@@ -3419,8 +3418,8 @@ void VPInterleaveRecipe::execute(VPTransformState &State) {
   if (isa<LoadInst>(Instr)) {
     Value *MaskForGaps = nullptr;
     if (NeedsMaskForGaps) {
-      MaskForGaps = createBitMaskForGaps(State.Builder,
-                                         State.VF.getKnownMinValue(), *Group);
+      MaskForGaps =
+          createBitMaskForGaps(State.Builder, State.VF.getFixedValue(), *Group);
       assert(MaskForGaps && "Mask for Gaps is required but it is null");
     }
 
@@ -3508,13 +3507,12 @@ void VPInterleaveRecipe::execute(VPTransformState &State) {
         continue;
 
       auto StrideMask =
-          createStrideMask(I, InterleaveFactor, State.VF.getKnownMinValue());
+          createStrideMask(I, InterleaveFactor, State.VF.getFixedValue());
       Value *StridedVec =
           State.Builder.CreateShuffleVector(NewLoad, StrideMask, "strided.vec");
 
       // If this member has different type, cast the result type.
       if (Member->getType() != ScalarTy) {
-        assert(!State.VF.isScalable() && "VF is assumed to be non scalable.");
         VectorType *OtherVTy = VectorType::get(Member->getType(), State.VF);
         StridedVec =
             createBitOrPointerCast(State.Builder, StridedVec, OtherVTy, DL);
@@ -3850,7 +3848,7 @@ VPFirstOrderRecurrencePHIRecipe::computeCost(ElementCount VF,
   if (VF.isScalar())
     return Ctx.TTI.getCFInstrCost(Instruction::PHI, Ctx.CostKind);
 
-  if (VF.isScalable() && VF.getKnownMinValue() == 1)
+  if (VF == ElementCount::getScalable(1))
     return InstructionCost::getInvalid();
 
   return 0;

fhahn · 2025-06-10T14:16:31Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  // TODO: We have tests that request the cost of extracting element
+  // VF.getKnownMinValue() - 1 from a scalable vector. This is actually
+  // meaningless, given what we actually want is the last lane and is likely
+  // to be more expensive.


The issue is generally that we allow to invariant addresses with scalable vectors and for those we may have to extract the last lane, but we query the last known lane, which is not what we need for scalable vectors, right?

Might be good to adjust the comment a bit, we have tests to exercise this code, but might be clearer to just explain the problematic code path?

Also, I presume that there's no way to query TTI for the last lane of a scalable vector at the moment?

I don't think there is such a TTI hook. Maybe an argument for adding one, or at least modifying an existing interface? I'm happy to look into it.

Maybe we could change the Index argument of getVectorInstrCost to be an ElementCount?

So I'm looking at changing the getVectorInstrCost interface to add support for other "unknown" indices that are more meaningful, such as the last element of a vector. In the meantime, I'm happy to update the comment. There are already quite a few tests exercising this code path (since we make different vectorisation decisions with my downstream patch). Once I've got something working I'll put up a WIP patch for people to look at.

I see, FWIW VPLane::Kind has something similar to represent the last lane:

/// Kind describes how to interpret Lane. enum class Kind : uint8_t { /// For First, Lane is the index into the first N elements of a /// fixed-vector <N x <ElTy>> or a scalable vector <vscale x N x <ElTy>>. First, /// For ScalableLast, Lane is the offset from the start of the last /// N-element subvector in a scalable vector <vscale x N x <ElTy>>. For /// example, a Lane of 0 corresponds to lane `(vscale - 1) * N`, a Lane of /// 1 corresponds to `((vscale - 1) * N) + 1`, etc. ScalableLast };

Yep probably best to first update the comment, then updating the interface

Might be good to adjust the comment a bit, we have tests to exercise this code, but might be clearer to just explain the problematic code path?

Looks like the comment update may have been missed?

I did update the comment. See second commit in the PR:

-// TODO: We have tests that request the cost of extracting element -// VF.getKnownMinValue() - 1 from a scalable vector. This is actually -// meaningless, given what we actually want is the last lane and is likely -// to be more expensive. +// TODO: We have existing tests that request the cost of extracting element +// VF.getKnownMinValue() - 1 from a scalable vector. This does not represent +// the actual generated code, which involves extracting the last element of +// a scalable vector where the lane to extract is unknown at compile time.

I'm not sure what you were expecting the updated comment to contain? You said Might be good to adjust the comment a bit, we have tests to exercise this code, but might be clearer to just explain the problematic code path? and I thought my updated comment was explaining the problematic code path? Which is that asking for the cost of extracting lane VF.getKnownMinValue() - 1 does not represent the actual generated code, which for SVE at least would be whilelo + lastb instructions.

Hi @fhahn @lukel97 I've created a PR to add support for querying the cost of extracting the last lane in a scalable vector. I'll add reviewers (and split up the PR if necessary) if you think it looks good! #144086

I'm not sure what you were expecting the updated comment to contain? You said Might be good to adjust the comment a bit, we have tests to exercise this code, but might be clearer to just explain the problematic code path? and I thought my updated comment was explaining the problematic code path? Which is that asking for the cost of extracting lane VF.getKnownMinValue() - 1 does not represent the actual generated code, which for SVE at least would be whilelo + lastb instructions.

Ah yes fair enough, I was originally thinking about dropping the bit about the tests, but the current version is fine too.

fhahn · 2025-06-10T14:17:42Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

      Value *StridedVec =
          State.Builder.CreateShuffleVector(NewLoad, StrideMask, "strided.vec");

      // If this member has different type, cast the result type.
      if (Member->getType() != ScalarTy) {
-        assert(!State.VF.isScalable() && "VF is assumed to be non scalable.");


I guess it would be fine to drop this here as it is covered by the getFixedValue above, but it may be clearer to keep the assert here or add it somewhere above?

lukel97 · 2025-06-11T08:48:59Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  // TODO: We have tests that request the cost of extracting element
+  // VF.getKnownMinValue() - 1 from a scalable vector. This is actually
+  // meaningless, given what we actually want is the last lane and is likely
+  // to be more expensive.


Maybe we could change the Index argument of getVectorInstrCost to be an ElementCount?

Mel-Chen · 2025-06-12T09:51:20Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -5318,7 +5319,6 @@ InstructionCost
 LoopVectorizationCostModel::getUniformMemOpCost(Instruction *I,
                                                ElementCount VF) {
  assert(Legal->isUniformMemOp(*I, VF));
-


This blank line was accidentally deleted

Mel-Chen · 2025-06-12T10:02:18Z

llvm/lib/Transforms/Vectorize/VPlan.cpp

@@ -331,7 +331,7 @@ Value *VPTransformState::get(const VPValue *Def, bool NeedsScalar) {

  bool IsSingleScalar = vputils::isSingleScalar(Def);

-  VPLane LastLane(IsSingleScalar ? 0 : VF.getKnownMinValue() - 1);
+  VPLane LastLane(IsSingleScalar ? 0 : VF.getFixedValue() - 1);


May I ask where it is guaranteed that the VF is not scalable?

This is an existing invariant, in the bit below there's an assertion that !VF.isScalable() if IsSingleScalar is false

Good question, but no tests broke (including LLVM test suite) when I made this change so either:

It's not possible to get here for scalable vectors, or

We're missing tests.

and also I think the code is already broken for scalable vectors because the last lane isn't VF.getKnownMinValue() - 1.

So I reasoned that we may as well strength the code, since it looks like we're effectively assuming fixed-width already.

There are many places in VPlan and LoopVectorize where we use getKnownMinValue to discover the number of elements in a vector. Where we expect the vector to have a fixed length, I have used the stronger getFixedValue call. I believe this is clearer and adds extra protection in the form of an assert in getFixedValue that the vector is not scalable. While looking at VPFirstOrderRecurrencePHIRecipe::computeCost I also took the liberty of simplifying the code. In theory I believe this patch should be NFC, but I'm reluctant to add that to the title in case we're just missing tests. I built and ran the LLVM test suite when targeting neoverse-v1 and it seemed ok.

…lvm#143526) There are many places in VPlan and LoopVectorize where we use getKnownMinValue to discover the number of elements in a vector. Where we expect the vector to have a fixed length, I have used the stronger getFixedValue call. I believe this is clearer and adds extra protection in the form of an assert in getFixedValue that the vector is not scalable. While looking at VPFirstOrderRecurrencePHIRecipe::computeCost I also took the liberty of simplifying the code. In theory I believe this patch should be NFC, but I'm reluctant to add that to the title in case we're just missing tests for some of the VPlan changes. I built and ran the LLVM test suite when targeting neoverse-v1 and it seemed ok.

david-arm requested review from fhahn, lukel97 and MDevereau June 10, 2025 12:57

llvmbot added vectorizers llvm:transforms labels Jun 10, 2025

fhahn reviewed Jun 10, 2025

View reviewed changes

lukel97 approved these changes Jun 11, 2025

View reviewed changes

Mel-Chen reviewed Jun 12, 2025

View reviewed changes

david-arm added 3 commits June 12, 2025 12:45

Address review comments

3957b6c

Re-add blank line

4050dc9

david-arm force-pushed the vscale_nfc branch from fa24f55 to 4050dc9 Compare June 12, 2025 12:46

david-arm merged commit 541e511 into llvm:main Jun 13, 2025
7 checks passed

david-arm mentioned this pull request Jun 16, 2025

[LV][TTI] Calculate cost of extracting last index in a scalable vector #144086

Open

[LV] Use getFixedValue instead of getKnownMinValue when appropriate #143526

[LV] Use getFixedValue instead of getKnownMinValue when appropriate #143526

Uh oh!

Conversation

david-arm commented Jun 10, 2025

Uh oh!

llvmbot commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Jun 10, 2025 •

edited

Loading