-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[WPD]: Apply speculative WPD in non-lto mode. #145031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
hassnaaHamdi
wants to merge
1
commit into
llvm:main
Choose a base branch
from
hassnaaHamdi:enable_wpd_nolto
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Member
hassnaaHamdi
commented
Jun 20, 2025
- This patch applies speculative devirtualization in non-lto mode where visibility is not needed.
- It's still safe to devirtualize becasue we do speculation.
- In non-lto mode, only speculative devirtualization is allowed without other features like vitual constant propagation to minimize the drawback of wrong speculation.
- This patch apply speculative devirtualization in non-lto mode where visibility is not needed. - It's still safe to devirtualize becasue we do speculation. - In non-lto mode, only speculative devirtualization is allowed without other features like vitual constant propagation to minimize the drawback of wrong speculation.
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: Hassnaa Hamdi (hassnaaHamdi) Changes
Full diff: https://github.com/llvm/llvm-project/pull/145031.diff 10 Files Affected:
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index 62844f7e6a2fa..a433a66e0b7a6 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2275,9 +2275,13 @@ are listed below.
.. option:: -fwhole-program-vtables
+ In LTO mode:
Enable whole-program vtable optimizations, such as single-implementation
devirtualization and virtual constant propagation, for classes with
- :doc:`hidden LTO visibility <LTOVisibility>`. Requires ``-flto``.
+ :doc:`hidden LTO visibility <LTOVisibility>`.
+ In non-LTO mode:
+ Enables speculative devirtualization only without other features.
+ Doesn't require ``-flto`` or visibility.
.. option:: -f[no]split-lto-unit
@@ -5170,7 +5174,7 @@ Execute ``clang-cl /?`` to see a list of supported options:
-fstandalone-debug Emit full debug info for all types used by the program
-fstrict-aliasing Enable optimizations based on strict aliasing rules
-fsyntax-only Run the preprocessor, parser and semantic analysis stages
- -fwhole-program-vtables Enables whole-program vtable optimization. Requires -flto
+ -fwhole-program-vtables Enables whole-program vtable optimization.
-gcodeview-ghash Emit type record hashes in a .debug$H section
-gcodeview Generate CodeView debug information
-gline-directives-only Emit debug line info directives only
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index 7e0a3cf5591ce..f6963aadfbc69 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -902,6 +902,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
// non-integrated assemblers don't recognize .cgprofile section.
PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;
PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO;
+ PTO.WholeProgramDevirt = CodeGenOpts.WholeProgramVTables;
LoopAnalysisManager LAM;
FunctionAnalysisManager FAM;
diff --git a/clang/lib/CodeGen/CGVTables.cpp b/clang/lib/CodeGen/CGVTables.cpp
index 2897ccdf88660..cfb78d623c7ec 100644
--- a/clang/lib/CodeGen/CGVTables.cpp
+++ b/clang/lib/CodeGen/CGVTables.cpp
@@ -1359,7 +1359,8 @@ void CodeGenModule::EmitVTableTypeMetadata(const CXXRecordDecl *RD,
// Emit type metadata on vtables with LTO or IR instrumentation.
// In IR instrumentation, the type metadata is used to find out vtable
// definitions (for type profiling) among all global variables.
- if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr())
+ if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr() &&
+ !getCodeGenOpts().WholeProgramVTables)
return;
CharUnits ComponentWidth = GetTargetTypeStoreSize(getVTableComponentType());
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 8556bcadf0915..cc337ad334f65 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7847,8 +7847,12 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
IsDeviceOffloadAction ? D.getLTOMode() : D.getOffloadLTOMode();
auto OtherIsUsingLTO = OtherLTOMode != LTOK_None;
- if ((!IsUsingLTO && !OtherIsUsingLTO) ||
- (IsPS4 && !UnifiedLTO && (D.getLTOMode() != LTOK_Full)))
+ if (!IsUsingLTO && !OtherIsUsingLTO && !UnifiedLTO) {
+ if (const Arg *A = Args.getLastArg(options::OPT_O_Group))
+ if (!A->getOption().matches(options::OPT_O0))
+ CmdArgs.push_back("-fwhole-program-vtables");
+ } else if ((!IsUsingLTO && !OtherIsUsingLTO) ||
+ (IsPS4 && !UnifiedLTO && (D.getLTOMode() != LTOK_Full)))
D.Diag(diag::err_drv_argument_only_allowed_with)
<< "-fwhole-program-vtables"
<< ((IsPS4 && !UnifiedLTO) ? "-flto=full" : "-flto");
diff --git a/clang/test/CodeGenCXX/devirt-single-impl.cpp b/clang/test/CodeGenCXX/devirt-single-impl.cpp
new file mode 100644
index 0000000000000..6ba15cec1ce9b
--- /dev/null
+++ b/clang/test/CodeGenCXX/devirt-single-impl.cpp
@@ -0,0 +1,56 @@
+// Check that speculative devirtualization works without the need for LTO or visibility.
+// RUN: %clang_cc1 -fwhole-program-vtables -O1 %s -emit-llvm -o - | FileCheck %s
+
+struct A {
+ A(){}
+ __attribute__((noinline))
+ virtual int virtual1(){return 20;}
+ __attribute__((noinline))
+ virtual void empty_virtual(){}
+};
+
+struct B : A {
+ B(){}
+ __attribute__((noinline))
+ virtual int virtual1() override {return 50;}
+ __attribute__((noinline))
+ virtual void empty_virtual() override {}
+};
+
+// Test that we can apply speculative devirtualization
+// without the need for LTO or visibility.
+__attribute__((noinline))
+int test_devirtual(A *a) {
+ // CHECK: %0 = load ptr, ptr %vtable, align 8
+ // CHECK-NEXT: %1 = icmp eq ptr %0, @_ZN1B8virtual1Ev
+ // CHECK-NEXT: br i1 %1, label %if.true.direct_targ, label %if.false.orig_indirect, !prof !12
+
+ // CHECK: if.true.direct_targ: ; preds = %entry
+ // CHECK-NEXT: %2 = tail call noundef i32 @_ZN1B8virtual1Ev(ptr noundef nonnull align 8 dereferenceable(8) %a)
+ // CHECK-NEXT: br label %if.end.icp
+
+ // CHECK: if.false.orig_indirect: ; preds = %entry
+ // CHECK-NEXT: %call = tail call noundef i32 %0(ptr noundef nonnull align 8 dereferenceable(8) %a)
+ // CHECK-NEXT: br label %if.end.icp
+
+ // CHECK: if.end.icp: ; preds = %if.false.orig_indirect, %if.true.direct_targ
+ // CHECK-NEXT: %3 = phi i32 [ %call, %if.false.orig_indirect ], [ %2, %if.true.direct_targ ]
+ // CHECK-NEXT: ret i32 %3
+
+ return a->virtual1();
+}
+
+// Test that we skip devirtualization for empty virtual functions as most probably
+// they are used for interfaces.
+__attribute__((noinline))
+void test_devirtual_empty_fn(A *a) {
+ // CHECK: load ptr, ptr %vfn, align 8
+ // CHECK-NEXT: tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %a)
+ a->empty_virtual();
+}
+
+void test() {
+ A *a = new B();
+ test_devirtual(a);
+ test_devirtual_empty_fn(a);
+}
\ No newline at end of file
diff --git a/clang/test/Driver/whole-program-vtables.c b/clang/test/Driver/whole-program-vtables.c
index 7f7c45e77f6f5..e0538b584f456 100644
--- a/clang/test/Driver/whole-program-vtables.c
+++ b/clang/test/Driver/whole-program-vtables.c
@@ -1,15 +1,11 @@
-// RUN: not %clang -target x86_64-unknown-linux -fwhole-program-vtables -### %s 2>&1 | FileCheck --check-prefix=NO-LTO %s
-// RUN: not %clang_cl --target=x86_64-pc-win32 -fwhole-program-vtables -### -- %s 2>&1 | FileCheck --check-prefix=NO-LTO %s
-// NO-LTO: invalid argument '-fwhole-program-vtables' only allowed with '-flto'
+// RUN: %clang -target x86_64-unknown-linux -fwhole-program-vtables -O1 -### %s 2>&1 | FileCheck --check-prefix=WPD-NO-LTO %s
+// RUN: %clang_cl --target=x86_64-pc-win32 -fwhole-program-vtables -O1 -### -- %s 2>&1 | FileCheck --check-prefix=WPD-NO-LTO %s
+// WPD-NO-LTO: "-fwhole-program-vtables"
// RUN: %clang -target x86_64-unknown-linux -fwhole-program-vtables -flto -### %s 2>&1 | FileCheck --check-prefix=LTO %s
// RUN: not %clang_cl --target=x86_64-pc-win32 -fwhole-program-vtables -flto -### -- %s 2>&1 | FileCheck --check-prefix=LTO %s
// LTO: "-fwhole-program-vtables"
-/// -funified-lto does not imply -flto, so we still get an error that fwhole-program-vtables has no effect without -flto
-// RUN: not %clang --target=x86_64-pc-linux-gnu -fwhole-program-vtables -funified-lto -### %s 2>&1 | FileCheck --check-prefix=NO-LTO %s
-// RUN: not %clang --target=x86_64-pc-linux-gnu -fwhole-program-vtables -fno-unified-lto -### %s 2>&1 | FileCheck --check-prefix=NO-LTO %s
-
// RUN: %clang -target x86_64-unknown-linux -fwhole-program-vtables -fno-whole-program-vtables -flto -### %s 2>&1 | FileCheck --check-prefix=LTO-DISABLE %s
// RUN: not %clang_cl --target=x86_64-pc-win32 -fwhole-program-vtables -fno-whole-program-vtables -flto -### -- %s 2>&1 | FileCheck --check-prefix=LTO-DISABLE %s
// LTO-DISABLE-NOT: "-fwhole-program-vtables"
diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h
index 51ccaa53447d7..ee08b11ce2c09 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -98,6 +98,12 @@ class PipelineTuningOptions {
// analyses after various module->function or cgscc->function adaptors in the
// default pipelines.
bool EagerlyInvalidateAnalyses;
+
+ /// Tuning option to enable/disable whole program devirtualization.
+ /// Its default value is false.
+ /// This is controlled by the `-whole-program-vtables` flag.
+ /// Used only in non-LTO mode.
+ bool WholeProgramDevirt;
};
/// This class provides access to building LLVM's passes.
diff --git a/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h b/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
index 7a03405b4f462..fff27fae162a0 100644
--- a/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
+++ b/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
@@ -226,11 +226,15 @@ struct WholeProgramDevirtPass : public PassInfoMixin<WholeProgramDevirtPass> {
ModuleSummaryIndex *ExportSummary;
const ModuleSummaryIndex *ImportSummary;
bool UseCommandLine = false;
+ const bool InLTOMode;
WholeProgramDevirtPass()
- : ExportSummary(nullptr), ImportSummary(nullptr), UseCommandLine(true) {}
+ : ExportSummary(nullptr), ImportSummary(nullptr), UseCommandLine(true),
+ InLTOMode(true) {}
WholeProgramDevirtPass(ModuleSummaryIndex *ExportSummary,
- const ModuleSummaryIndex *ImportSummary)
- : ExportSummary(ExportSummary), ImportSummary(ImportSummary) {
+ const ModuleSummaryIndex *ImportSummary,
+ bool InLTOMode = true)
+ : ExportSummary(ExportSummary), ImportSummary(ImportSummary),
+ InLTOMode(InLTOMode) {
assert(!(ExportSummary && ImportSummary));
}
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &);
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index a99146d5eaa34..4b10c63fd4e02 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -321,6 +321,7 @@ PipelineTuningOptions::PipelineTuningOptions() {
MergeFunctions = EnableMergeFunctions;
InlinerThreshold = -1;
EagerlyInvalidateAnalyses = EnableEagerlyInvalidateAnalyses;
+ WholeProgramDevirt = false;
}
namespace llvm {
@@ -1629,6 +1630,23 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
if (!LTOPreLink)
MPM.addPass(RelLookupTableConverterPass());
+ if (PTO.WholeProgramDevirt && LTOPhase == ThinOrFullLTOPhase::None) {
+ MPM.addPass(WholeProgramDevirtPass(/*ExportSummary*/ nullptr,
+ /*ImportSummary*/ nullptr,
+ /*InLTOMode=*/false));
+ MPM.addPass(LowerTypeTestsPass(nullptr, nullptr,
+ lowertypetests::DropTestKind::Assume));
+ if (EnableModuleInliner) {
+ MPM.addPass(ModuleInlinerPass(getInlineParamsFromOptLevel(Level),
+ UseInlineAdvisor,
+ ThinOrFullLTOPhase::None));
+ } else {
+ MPM.addPass(ModuleInlinerWrapperPass(
+ getInlineParamsFromOptLevel(Level),
+ /* MandatoryFirst */ true,
+ InlineContext{ThinOrFullLTOPhase::None, InlinePass::CGSCCInliner}));
+ }
+ }
return MPM;
}
diff --git a/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp b/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
index 30e1dc7167a39..0fe8a22eb5c0f 100644
--- a/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
+++ b/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
@@ -24,7 +24,8 @@
// returns 0, or a single vtable's function returns 1, replace each virtual
// call with a comparison of the vptr against that vtable's address.
//
-// This pass is intended to be used during the regular and thin LTO pipelines:
+// This pass is intended to be used during the regular/thinLTO and non-LTO
+// pipelines:
//
// During regular LTO, the pass determines the best optimization for each
// virtual call and applies the resolutions directly to virtual calls that are
@@ -48,6 +49,13 @@
// is supported.
// - Import phase: (same as with hybrid case above).
//
+// In non-LTO mode:
+// - The pass apply speculative devirtualization without requiring any type of
+// visibility.
+// - Skips other features like virtual constant propagation, uniform return
+// value
+// optimization, unique return value optimization, branch funnels to minimize
+// the drawbacks of wrong speculation.
//===----------------------------------------------------------------------===//
#include "llvm/Transforms/IPO/WholeProgramDevirt.h"
@@ -60,7 +68,9 @@
#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"
+#include "llvm/Analysis/ModuleSummaryAnalysis.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Analysis/TypeMetadataUtils.h"
#include "llvm/Bitcode/BitcodeReader.h"
#include "llvm/Bitcode/BitcodeWriter.h"
@@ -798,6 +808,21 @@ PreservedAnalyses WholeProgramDevirtPass::run(Module &M,
return PreservedAnalyses::all();
return PreservedAnalyses::none();
}
+ std::optional<ModuleSummaryIndex> Index;
+ // Force Fallback mode as it's safe in case it's non-LTO mode where
+ // we don't have hidden visibility.
+ if (!InLTOMode) {
+ DevirtCheckMode = WPDCheckMode::Fallback;
+ // In non-LTO mode, we don't have an ExportSummary, so we
+ // build the ExportSummary from the module.
+ assert(!ExportSummary &&
+ "ExportSummary is expected to be empty in non-LTO mode");
+ if (DevirtCheckMode == WPDCheckMode::Fallback && !ExportSummary) {
+ ProfileSummaryInfo PSI(M);
+ Index.emplace(buildModuleSummaryIndex(M, nullptr, &PSI));
+ ExportSummary = Index.has_value() ? &Index.value() : nullptr;
+ }
+ }
if (!DevirtModule(M, AARGetter, OREGetter, LookupDomTree, ExportSummary,
ImportSummary)
.run())
@@ -1091,10 +1116,12 @@ bool DevirtModule::tryFindVirtualCallTargets(
if (!TM.Bits->GV->isConstant())
return false;
- // We cannot perform whole program devirtualization analysis on a vtable
- // with public LTO visibility.
- if (TM.Bits->GV->getVCallVisibility() ==
- GlobalObject::VCallVisibilityPublic)
+ // If speculative devirtualization is NOT enabled, it's not safe to perform
+ // whole program devirtualization
+ // analysis on a vtable with public LTO visibility.
+ if (DevirtCheckMode != WPDCheckMode::Fallback &&
+ TM.Bits->GV->getVCallVisibility() ==
+ GlobalObject::VCallVisibilityPublic)
return false;
Function *Fn = nullptr;
@@ -1112,6 +1139,11 @@ bool DevirtModule::tryFindVirtualCallTargets(
// calls to pure virtuals are UB.
if (Fn->getName() == "__cxa_pure_virtual")
continue;
+ // In Most cases empty functions will be overridden by the
+ // implementation of the derived class, so we can skip them.
+ if (DevirtCheckMode == WPDCheckMode::Fallback &&
+ Fn->getReturnType()->isVoidTy() && Fn->getInstructionCount() <= 1)
+ continue;
// We can disregard unreachable functions as possible call targets, as
// unreachable functions shouldn't be called.
@@ -1333,10 +1365,11 @@ bool DevirtModule::trySingleImplDevirt(
if (!IsExported)
return false;
- // If the only implementation has local linkage, we must promote to external
- // to make it visible to thin LTO objects. We can only get here during the
- // ThinLTO export phase.
- if (TheFn->hasLocalLinkage()) {
+ // In case of non-speculative devirtualization, If the only implementation has
+ // local linkage, we must promote to external
+ // to make it visible to thin LTO objects. We can only get here during the
+ // ThinLTO export phase.
+ if (DevirtCheckMode != WPDCheckMode::Fallback && TheFn->hasLocalLinkage()) {
std::string NewName = (TheFn->getName() + ".llvm.merged").str();
// Since we are renaming the function, any comdats with the same name must
@@ -2315,6 +2348,11 @@ bool DevirtModule::run() {
Function *TypeTestFunc =
Intrinsic::getDeclarationIfExists(&M, Intrinsic::type_test);
+ // If we are applying speculative devirtualization, we can work on the public
+ // type test intrinsics.
+ if (!TypeTestFunc && DevirtCheckMode == WPDCheckMode::Fallback)
+ TypeTestFunc =
+ Intrinsic::getDeclarationIfExists(&M, Intrinsic::public_type_test);
Function *TypeCheckedLoadFunc =
Intrinsic::getDeclarationIfExists(&M, Intrinsic::type_checked_load);
Function *TypeCheckedLoadRelativeFunc = Intrinsic::getDeclarationIfExists(
@@ -2437,12 +2475,18 @@ bool DevirtModule::run() {
.WPDRes[S.first.ByteOffset];
if (tryFindVirtualCallTargets(TargetsForSlot, TypeMemberInfos,
S.first.ByteOffset, ExportSummary)) {
-
- if (!trySingleImplDevirt(ExportSummary, TargetsForSlot, S.second, Res)) {
- DidVirtualConstProp |=
- tryVirtualConstProp(TargetsForSlot, S.second, Res, S.first);
-
- tryICallBranchFunnel(TargetsForSlot, S.second, Res, S.first);
+ trySingleImplDevirt(ExportSummary, TargetsForSlot, S.second, Res);
+ // In Speculative devirt mode, we skip virtual constant propagation
+ // and branch funneling to minimize the drawback if we got wrong
+ // speculation during devirtualization.
+ if (DevirtCheckMode != WPDCheckMode::Fallback) {
+ if (!trySingleImplDevirt(ExportSummary, TargetsForSlot, S.second,
+ Res)) {
+ DidVirtualConstProp |=
+ tryVirtualConstProp(TargetsForSlot, S.second, Res, S.first);
+
+ tryICallBranchFunnel(TargetsForSlot, S.second, Res, S.first);
+ }
}
// Collect functions devirtualized at least for one call site for stats.
|
@llvm/pr-subscribers-llvm-transforms Author: Hassnaa Hamdi (hassnaaHamdi) Changes
Full diff: https://github.com/llvm/llvm-project/pull/145031.diff 10 Files Affected:
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index 62844f7e6a2fa..a433a66e0b7a6 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2275,9 +2275,13 @@ are listed below.
.. option:: -fwhole-program-vtables
+ In LTO mode:
Enable whole-program vtable optimizations, such as single-implementation
devirtualization and virtual constant propagation, for classes with
- :doc:`hidden LTO visibility <LTOVisibility>`. Requires ``-flto``.
+ :doc:`hidden LTO visibility <LTOVisibility>`.
+ In non-LTO mode:
+ Enables speculative devirtualization only without other features.
+ Doesn't require ``-flto`` or visibility.
.. option:: -f[no]split-lto-unit
@@ -5170,7 +5174,7 @@ Execute ``clang-cl /?`` to see a list of supported options:
-fstandalone-debug Emit full debug info for all types used by the program
-fstrict-aliasing Enable optimizations based on strict aliasing rules
-fsyntax-only Run the preprocessor, parser and semantic analysis stages
- -fwhole-program-vtables Enables whole-program vtable optimization. Requires -flto
+ -fwhole-program-vtables Enables whole-program vtable optimization.
-gcodeview-ghash Emit type record hashes in a .debug$H section
-gcodeview Generate CodeView debug information
-gline-directives-only Emit debug line info directives only
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index 7e0a3cf5591ce..f6963aadfbc69 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -902,6 +902,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
// non-integrated assemblers don't recognize .cgprofile section.
PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;
PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO;
+ PTO.WholeProgramDevirt = CodeGenOpts.WholeProgramVTables;
LoopAnalysisManager LAM;
FunctionAnalysisManager FAM;
diff --git a/clang/lib/CodeGen/CGVTables.cpp b/clang/lib/CodeGen/CGVTables.cpp
index 2897ccdf88660..cfb78d623c7ec 100644
--- a/clang/lib/CodeGen/CGVTables.cpp
+++ b/clang/lib/CodeGen/CGVTables.cpp
@@ -1359,7 +1359,8 @@ void CodeGenModule::EmitVTableTypeMetadata(const CXXRecordDecl *RD,
// Emit type metadata on vtables with LTO or IR instrumentation.
// In IR instrumentation, the type metadata is used to find out vtable
// definitions (for type profiling) among all global variables.
- if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr())
+ if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr() &&
+ !getCodeGenOpts().WholeProgramVTables)
return;
CharUnits ComponentWidth = GetTargetTypeStoreSize(getVTableComponentType());
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 8556bcadf0915..cc337ad334f65 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7847,8 +7847,12 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
IsDeviceOffloadAction ? D.getLTOMode() : D.getOffloadLTOMode();
auto OtherIsUsingLTO = OtherLTOMode != LTOK_None;
- if ((!IsUsingLTO && !OtherIsUsingLTO) ||
- (IsPS4 && !UnifiedLTO && (D.getLTOMode() != LTOK_Full)))
+ if (!IsUsingLTO && !OtherIsUsingLTO && !UnifiedLTO) {
+ if (const Arg *A = Args.getLastArg(options::OPT_O_Group))
+ if (!A->getOption().matches(options::OPT_O0))
+ CmdArgs.push_back("-fwhole-program-vtables");
+ } else if ((!IsUsingLTO && !OtherIsUsingLTO) ||
+ (IsPS4 && !UnifiedLTO && (D.getLTOMode() != LTOK_Full)))
D.Diag(diag::err_drv_argument_only_allowed_with)
<< "-fwhole-program-vtables"
<< ((IsPS4 && !UnifiedLTO) ? "-flto=full" : "-flto");
diff --git a/clang/test/CodeGenCXX/devirt-single-impl.cpp b/clang/test/CodeGenCXX/devirt-single-impl.cpp
new file mode 100644
index 0000000000000..6ba15cec1ce9b
--- /dev/null
+++ b/clang/test/CodeGenCXX/devirt-single-impl.cpp
@@ -0,0 +1,56 @@
+// Check that speculative devirtualization works without the need for LTO or visibility.
+// RUN: %clang_cc1 -fwhole-program-vtables -O1 %s -emit-llvm -o - | FileCheck %s
+
+struct A {
+ A(){}
+ __attribute__((noinline))
+ virtual int virtual1(){return 20;}
+ __attribute__((noinline))
+ virtual void empty_virtual(){}
+};
+
+struct B : A {
+ B(){}
+ __attribute__((noinline))
+ virtual int virtual1() override {return 50;}
+ __attribute__((noinline))
+ virtual void empty_virtual() override {}
+};
+
+// Test that we can apply speculative devirtualization
+// without the need for LTO or visibility.
+__attribute__((noinline))
+int test_devirtual(A *a) {
+ // CHECK: %0 = load ptr, ptr %vtable, align 8
+ // CHECK-NEXT: %1 = icmp eq ptr %0, @_ZN1B8virtual1Ev
+ // CHECK-NEXT: br i1 %1, label %if.true.direct_targ, label %if.false.orig_indirect, !prof !12
+
+ // CHECK: if.true.direct_targ: ; preds = %entry
+ // CHECK-NEXT: %2 = tail call noundef i32 @_ZN1B8virtual1Ev(ptr noundef nonnull align 8 dereferenceable(8) %a)
+ // CHECK-NEXT: br label %if.end.icp
+
+ // CHECK: if.false.orig_indirect: ; preds = %entry
+ // CHECK-NEXT: %call = tail call noundef i32 %0(ptr noundef nonnull align 8 dereferenceable(8) %a)
+ // CHECK-NEXT: br label %if.end.icp
+
+ // CHECK: if.end.icp: ; preds = %if.false.orig_indirect, %if.true.direct_targ
+ // CHECK-NEXT: %3 = phi i32 [ %call, %if.false.orig_indirect ], [ %2, %if.true.direct_targ ]
+ // CHECK-NEXT: ret i32 %3
+
+ return a->virtual1();
+}
+
+// Test that we skip devirtualization for empty virtual functions as most probably
+// they are used for interfaces.
+__attribute__((noinline))
+void test_devirtual_empty_fn(A *a) {
+ // CHECK: load ptr, ptr %vfn, align 8
+ // CHECK-NEXT: tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %a)
+ a->empty_virtual();
+}
+
+void test() {
+ A *a = new B();
+ test_devirtual(a);
+ test_devirtual_empty_fn(a);
+}
\ No newline at end of file
diff --git a/clang/test/Driver/whole-program-vtables.c b/clang/test/Driver/whole-program-vtables.c
index 7f7c45e77f6f5..e0538b584f456 100644
--- a/clang/test/Driver/whole-program-vtables.c
+++ b/clang/test/Driver/whole-program-vtables.c
@@ -1,15 +1,11 @@
-// RUN: not %clang -target x86_64-unknown-linux -fwhole-program-vtables -### %s 2>&1 | FileCheck --check-prefix=NO-LTO %s
-// RUN: not %clang_cl --target=x86_64-pc-win32 -fwhole-program-vtables -### -- %s 2>&1 | FileCheck --check-prefix=NO-LTO %s
-// NO-LTO: invalid argument '-fwhole-program-vtables' only allowed with '-flto'
+// RUN: %clang -target x86_64-unknown-linux -fwhole-program-vtables -O1 -### %s 2>&1 | FileCheck --check-prefix=WPD-NO-LTO %s
+// RUN: %clang_cl --target=x86_64-pc-win32 -fwhole-program-vtables -O1 -### -- %s 2>&1 | FileCheck --check-prefix=WPD-NO-LTO %s
+// WPD-NO-LTO: "-fwhole-program-vtables"
// RUN: %clang -target x86_64-unknown-linux -fwhole-program-vtables -flto -### %s 2>&1 | FileCheck --check-prefix=LTO %s
// RUN: not %clang_cl --target=x86_64-pc-win32 -fwhole-program-vtables -flto -### -- %s 2>&1 | FileCheck --check-prefix=LTO %s
// LTO: "-fwhole-program-vtables"
-/// -funified-lto does not imply -flto, so we still get an error that fwhole-program-vtables has no effect without -flto
-// RUN: not %clang --target=x86_64-pc-linux-gnu -fwhole-program-vtables -funified-lto -### %s 2>&1 | FileCheck --check-prefix=NO-LTO %s
-// RUN: not %clang --target=x86_64-pc-linux-gnu -fwhole-program-vtables -fno-unified-lto -### %s 2>&1 | FileCheck --check-prefix=NO-LTO %s
-
// RUN: %clang -target x86_64-unknown-linux -fwhole-program-vtables -fno-whole-program-vtables -flto -### %s 2>&1 | FileCheck --check-prefix=LTO-DISABLE %s
// RUN: not %clang_cl --target=x86_64-pc-win32 -fwhole-program-vtables -fno-whole-program-vtables -flto -### -- %s 2>&1 | FileCheck --check-prefix=LTO-DISABLE %s
// LTO-DISABLE-NOT: "-fwhole-program-vtables"
diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h
index 51ccaa53447d7..ee08b11ce2c09 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -98,6 +98,12 @@ class PipelineTuningOptions {
// analyses after various module->function or cgscc->function adaptors in the
// default pipelines.
bool EagerlyInvalidateAnalyses;
+
+ /// Tuning option to enable/disable whole program devirtualization.
+ /// Its default value is false.
+ /// This is controlled by the `-whole-program-vtables` flag.
+ /// Used only in non-LTO mode.
+ bool WholeProgramDevirt;
};
/// This class provides access to building LLVM's passes.
diff --git a/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h b/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
index 7a03405b4f462..fff27fae162a0 100644
--- a/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
+++ b/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
@@ -226,11 +226,15 @@ struct WholeProgramDevirtPass : public PassInfoMixin<WholeProgramDevirtPass> {
ModuleSummaryIndex *ExportSummary;
const ModuleSummaryIndex *ImportSummary;
bool UseCommandLine = false;
+ const bool InLTOMode;
WholeProgramDevirtPass()
- : ExportSummary(nullptr), ImportSummary(nullptr), UseCommandLine(true) {}
+ : ExportSummary(nullptr), ImportSummary(nullptr), UseCommandLine(true),
+ InLTOMode(true) {}
WholeProgramDevirtPass(ModuleSummaryIndex *ExportSummary,
- const ModuleSummaryIndex *ImportSummary)
- : ExportSummary(ExportSummary), ImportSummary(ImportSummary) {
+ const ModuleSummaryIndex *ImportSummary,
+ bool InLTOMode = true)
+ : ExportSummary(ExportSummary), ImportSummary(ImportSummary),
+ InLTOMode(InLTOMode) {
assert(!(ExportSummary && ImportSummary));
}
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &);
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index a99146d5eaa34..4b10c63fd4e02 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -321,6 +321,7 @@ PipelineTuningOptions::PipelineTuningOptions() {
MergeFunctions = EnableMergeFunctions;
InlinerThreshold = -1;
EagerlyInvalidateAnalyses = EnableEagerlyInvalidateAnalyses;
+ WholeProgramDevirt = false;
}
namespace llvm {
@@ -1629,6 +1630,23 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
if (!LTOPreLink)
MPM.addPass(RelLookupTableConverterPass());
+ if (PTO.WholeProgramDevirt && LTOPhase == ThinOrFullLTOPhase::None) {
+ MPM.addPass(WholeProgramDevirtPass(/*ExportSummary*/ nullptr,
+ /*ImportSummary*/ nullptr,
+ /*InLTOMode=*/false));
+ MPM.addPass(LowerTypeTestsPass(nullptr, nullptr,
+ lowertypetests::DropTestKind::Assume));
+ if (EnableModuleInliner) {
+ MPM.addPass(ModuleInlinerPass(getInlineParamsFromOptLevel(Level),
+ UseInlineAdvisor,
+ ThinOrFullLTOPhase::None));
+ } else {
+ MPM.addPass(ModuleInlinerWrapperPass(
+ getInlineParamsFromOptLevel(Level),
+ /* MandatoryFirst */ true,
+ InlineContext{ThinOrFullLTOPhase::None, InlinePass::CGSCCInliner}));
+ }
+ }
return MPM;
}
diff --git a/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp b/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
index 30e1dc7167a39..0fe8a22eb5c0f 100644
--- a/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
+++ b/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
@@ -24,7 +24,8 @@
// returns 0, or a single vtable's function returns 1, replace each virtual
// call with a comparison of the vptr against that vtable's address.
//
-// This pass is intended to be used during the regular and thin LTO pipelines:
+// This pass is intended to be used during the regular/thinLTO and non-LTO
+// pipelines:
//
// During regular LTO, the pass determines the best optimization for each
// virtual call and applies the resolutions directly to virtual calls that are
@@ -48,6 +49,13 @@
// is supported.
// - Import phase: (same as with hybrid case above).
//
+// In non-LTO mode:
+// - The pass apply speculative devirtualization without requiring any type of
+// visibility.
+// - Skips other features like virtual constant propagation, uniform return
+// value
+// optimization, unique return value optimization, branch funnels to minimize
+// the drawbacks of wrong speculation.
//===----------------------------------------------------------------------===//
#include "llvm/Transforms/IPO/WholeProgramDevirt.h"
@@ -60,7 +68,9 @@
#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"
+#include "llvm/Analysis/ModuleSummaryAnalysis.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Analysis/TypeMetadataUtils.h"
#include "llvm/Bitcode/BitcodeReader.h"
#include "llvm/Bitcode/BitcodeWriter.h"
@@ -798,6 +808,21 @@ PreservedAnalyses WholeProgramDevirtPass::run(Module &M,
return PreservedAnalyses::all();
return PreservedAnalyses::none();
}
+ std::optional<ModuleSummaryIndex> Index;
+ // Force Fallback mode as it's safe in case it's non-LTO mode where
+ // we don't have hidden visibility.
+ if (!InLTOMode) {
+ DevirtCheckMode = WPDCheckMode::Fallback;
+ // In non-LTO mode, we don't have an ExportSummary, so we
+ // build the ExportSummary from the module.
+ assert(!ExportSummary &&
+ "ExportSummary is expected to be empty in non-LTO mode");
+ if (DevirtCheckMode == WPDCheckMode::Fallback && !ExportSummary) {
+ ProfileSummaryInfo PSI(M);
+ Index.emplace(buildModuleSummaryIndex(M, nullptr, &PSI));
+ ExportSummary = Index.has_value() ? &Index.value() : nullptr;
+ }
+ }
if (!DevirtModule(M, AARGetter, OREGetter, LookupDomTree, ExportSummary,
ImportSummary)
.run())
@@ -1091,10 +1116,12 @@ bool DevirtModule::tryFindVirtualCallTargets(
if (!TM.Bits->GV->isConstant())
return false;
- // We cannot perform whole program devirtualization analysis on a vtable
- // with public LTO visibility.
- if (TM.Bits->GV->getVCallVisibility() ==
- GlobalObject::VCallVisibilityPublic)
+ // If speculative devirtualization is NOT enabled, it's not safe to perform
+ // whole program devirtualization
+ // analysis on a vtable with public LTO visibility.
+ if (DevirtCheckMode != WPDCheckMode::Fallback &&
+ TM.Bits->GV->getVCallVisibility() ==
+ GlobalObject::VCallVisibilityPublic)
return false;
Function *Fn = nullptr;
@@ -1112,6 +1139,11 @@ bool DevirtModule::tryFindVirtualCallTargets(
// calls to pure virtuals are UB.
if (Fn->getName() == "__cxa_pure_virtual")
continue;
+ // In Most cases empty functions will be overridden by the
+ // implementation of the derived class, so we can skip them.
+ if (DevirtCheckMode == WPDCheckMode::Fallback &&
+ Fn->getReturnType()->isVoidTy() && Fn->getInstructionCount() <= 1)
+ continue;
// We can disregard unreachable functions as possible call targets, as
// unreachable functions shouldn't be called.
@@ -1333,10 +1365,11 @@ bool DevirtModule::trySingleImplDevirt(
if (!IsExported)
return false;
- // If the only implementation has local linkage, we must promote to external
- // to make it visible to thin LTO objects. We can only get here during the
- // ThinLTO export phase.
- if (TheFn->hasLocalLinkage()) {
+ // In case of non-speculative devirtualization, If the only implementation has
+ // local linkage, we must promote to external
+ // to make it visible to thin LTO objects. We can only get here during the
+ // ThinLTO export phase.
+ if (DevirtCheckMode != WPDCheckMode::Fallback && TheFn->hasLocalLinkage()) {
std::string NewName = (TheFn->getName() + ".llvm.merged").str();
// Since we are renaming the function, any comdats with the same name must
@@ -2315,6 +2348,11 @@ bool DevirtModule::run() {
Function *TypeTestFunc =
Intrinsic::getDeclarationIfExists(&M, Intrinsic::type_test);
+ // If we are applying speculative devirtualization, we can work on the public
+ // type test intrinsics.
+ if (!TypeTestFunc && DevirtCheckMode == WPDCheckMode::Fallback)
+ TypeTestFunc =
+ Intrinsic::getDeclarationIfExists(&M, Intrinsic::public_type_test);
Function *TypeCheckedLoadFunc =
Intrinsic::getDeclarationIfExists(&M, Intrinsic::type_checked_load);
Function *TypeCheckedLoadRelativeFunc = Intrinsic::getDeclarationIfExists(
@@ -2437,12 +2475,18 @@ bool DevirtModule::run() {
.WPDRes[S.first.ByteOffset];
if (tryFindVirtualCallTargets(TargetsForSlot, TypeMemberInfos,
S.first.ByteOffset, ExportSummary)) {
-
- if (!trySingleImplDevirt(ExportSummary, TargetsForSlot, S.second, Res)) {
- DidVirtualConstProp |=
- tryVirtualConstProp(TargetsForSlot, S.second, Res, S.first);
-
- tryICallBranchFunnel(TargetsForSlot, S.second, Res, S.first);
+ trySingleImplDevirt(ExportSummary, TargetsForSlot, S.second, Res);
+ // In Speculative devirt mode, we skip virtual constant propagation
+ // and branch funneling to minimize the drawback if we got wrong
+ // speculation during devirtualization.
+ if (DevirtCheckMode != WPDCheckMode::Fallback) {
+ if (!trySingleImplDevirt(ExportSummary, TargetsForSlot, S.second,
+ Res)) {
+ DidVirtualConstProp |=
+ tryVirtualConstProp(TargetsForSlot, S.second, Res, S.first);
+
+ tryICallBranchFunnel(TargetsForSlot, S.second, Res, S.first);
+ }
}
// Collect functions devirtualized at least for one call site for stats.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
clang:codegen
IR generation bugs: mangling, exceptions, etc.
clang:driver
'clang' and 'clang++' user-facing binaries. Not 'clang-cl'
clang
Clang issues not falling into any other category
llvm:transforms
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.