Skip to content

LLVM and SPIRV-LLVM-Translator pulldown (WW25 2025) #19031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1,646 commits into
base: sycl
Choose a base branch
from
Draft

Conversation

iclsrc
Copy link
Contributor

@iclsrc iclsrc commented Jun 17, 2025

asb and others added 30 commits June 2, 2025 22:24
…+addi(w) pairs when possible (#141663)

The logic in RISCVMatInt would previously produce lui+addiw on RV64
whenever a 32-bit integer must be materialised and the Hi20 and Lo12
parts are non-zero. However, sometimes addi can be used equivalently
(whenever the sign extension behaviour of addiw would be a no-op). This
patch moves to using addiw only when necessary. Although there is
absolutely no advantage in terms of compressibility or performance, this
has the following advantages:
* It's more consistent with logic used elsewhere in the backend. For
instance, RISCVOptWInstrs will try to convert addiw to addi on the basis
it reduces test diffs vs RV32.
* This matches the lowering GCC does in its codegen path. Unlike LLVM,
GCC seems to have different expansion logic for the assembler vs
codegen. For codegen it will use lui+addi if possible, but expanding
`li` in the assembler will always produces lui+addiw as LLVM did prior
to this commit. As someone who has been looking at a lot of gcc vs clang
diffs lately, reducing unnecessary divergence is of at least some value.
* As the diff for fold-mem-offset.ll shows, we can fold memory offsets
in more cases when addi is used. Memory offset folding could be taught
to recognise when the addiw could be replaced with an addi, but that
seems unnecessary when we can simply change the logic in RISCVMatInt.

As pointed out by @topperc during review, making this change without
modifying RISCVOptWInstrs risks introducing some cases where we fail to
remove a sext.w that we removed before. I've incorporated a patch based
on a suggestion from Craig that avoids it, and also adds appropriate
RISCVOptWInstrs test cases.

The initial patch description noted that the main motivation was to
avoid unnecessary differences both for RV32/RV64 and when comparing GCC,
but noted that very occasionally we see a benefit from memory offset
folding kicking in when it didn't before. Looking at the dynamic
instruction count difference for SPEC benchmarks targeting rva22u64 and
it shows we actually get a meaningful
~4.3% reduction in dynamic icount for 519.lbm_r. Looking at the data
more closely, the codegen difference is in `LBM_performStreamCollideTRT`
which as a function accounts for ~98% for dynamically executed
instructions and the codegen diffs appear to be a knock-on effect of the
address merging reducing register pressure right from function entry
(for instance, we get a big reduction in dynamically executed loads in
that function).

Below is the icount data (rva22u64 -O3, no LTO):
```
Benchmark                Baseline            This PR   Diff (%)
============================================================
500.perlbench_r         174116601991    174115795810     -0.00%
502.gcc_r               218903280858    218903215788     -0.00%
505.mcf_r               131208029185    131207692803     -0.00%
508.namd_r              217497594322    217497594297     -0.00%
510.parest_r            289314486153    289313577652     -0.00%
511.povray_r             30640531048     30640765701      0.00%
519.lbm_r                95897914862     91712688050     -4.36%
520.omnetpp_r           134641549722    134867015683      0.17%
523.xalancbmk_r         281462762992    281432092673     -0.01%
525.x264_r              379776121941    379535558210     -0.06%
526.blender_r           659736022025    659738387343      0.00%
531.deepsjeng_r         349122867552    349122867481     -0.00%
538.imagick_r           238558760552    238558753269     -0.00%
541.leela_r             406578560612    406385135260     -0.05%
544.nab_r               400997131674    400996765827     -0.00%
557.xz_r                130079522194    129945515709     -0.10%

```

The instcounting setup I use doesn't have good support for drilling down
into functions from outside the linked executable (e.g. libc). The
difference in omnetpp all seems to come from there, and does not reflect
any degradation in codegen quality.

I can confirm with the current version of the PR there is no change in
the number of static sext.w across all the SPEC 2017 benchmarks
(rva22u64 O3)

Co-authored-by: Craig Topper <[email protected]>
…d instructions fixed (#136868)" (#142382)

This reverts commit 475531b. But it
does not revert the changes from commits
36850a0 and
5dc3cd0.
- Allow running when no workspace folder is present, and do not override
the `cwd` set in the launch configuration
- Support getting configuration from workspace file

Fixes #142469
See llvm/llvm-project#139128
If multiple entries match the source, than the latest entry takes the
precedence.`
See llvm/llvm-project#139128
If multiple entries match the source, than the latest entry takes the
precedence.
Implemented wmempcpy and added tests
This PR removes the necessity to know the dimension of the embeddings while invoking `Embedder::Create`. Having the `Dimension` parameter introduces complexities in downstream consumers. 

(Tracking issue - #141817)
This fixes a bug where parsing PDBs with usages of register enums were
asserting.

The main problem is that printing out the code view register enums are
taken care of here:

https://github.com/nikitalita/llvm-project/blob/e4888a92402f53000a3a5e79d3792c034fc2f343/llvm/lib/ObjectYAML/CodeViewYAMLSymbols.cpp#L152

Which requires a COFF::header in the IO context for the machine type,
which we didn't have when dumping a pdb or parsing a yaml file. So, we
make a fake one with the machine type.
Implemented wcscmp and added tests
fixes #124347 
Implemented wcsrchr and added tests
Implemented wcsncat and tests.

---------

Co-authored-by: Sriya Pratipati <[email protected]>
Added CRASH_ON_NULLPTR macro to wcspbrk function and related test
CFIPrograms' most common uses are within debug frames, but it is not
their only use. For example, some assembly writers encode them by hand
into .cfi_escape directives. This PR extracts code for them into its own
files, setting them up to be evaluated from outside debug frames
themselves.

One in a series of NFC DebugInfo/DWARF refactoring changes to layer it
more cleanly, so that binary CFI parsing can be used from low-level
code, (such as byte strings created via .cfi_escape) without circular
dependencies. The final goal is to make a more limited dwarf library
usable from lower-level code.
## Purpose

This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/IR`,
`llvm/IRPrinter`, and `llvm/IRReader` libraries. These annotations
currently have no meaningful impact on the LLVM build; however, they are
a prerequisite to support an LLVM Windows DLL (shared library) build.

## Background

This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).

The bulk of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.

The following manual adjustments were also applied after running IDS on
Linux:
- Add `#include "llvm/Support/Compiler.h"` to files where it was not
auto-added by IDS due to no pre-existing block of include statements.
- Add `LLVM_ABI_FRIEND` to friend member functions declared with
`LLVM_ABI`
- Add `LLVM_TEMPLATE_ABI` and `LLVM_EXPORT_TEMPLATE` to exported
instantiated templates
- Add `LLVM_ABI` to a subset of private class methods and fields that
require export
- Add `LLVM_ABI` to a small number of symbols that require export but
are not declared in headers
- Reorder `LLVM_ABI` with `[[deprecated]]` and `[[nodiscard]]`
attributes.

## Validation

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
Implemented wcsstr and tests.
fixes #124348

---------

Co-authored-by: Sriya Pratipati <[email protected]>
Implemented wcsncmp and added tests
The yaml ended up with a typo, possibly due to merge issues. This patch
fixes it.
This change upstreams the code to support emitting inline C++ function
definitions, including the ASTConsumer handler for inline definitions
and the code to load the 'this' pointer.

This necessitates introducing the Itanium CXXABI class. No other CXXABI
subclasses are supported at this time. The Itanium CXXABI is used for
AppleARM64, which will require its own handler for a few special cases
(such as array cookies) later.
The subject test sporadically fails on the AIX builder:
https://lab.llvm.org/buildbot/#/builders/64/builds/3921/steps/6/logs/FAIL__lit___timeout-hang_py

This appears to be an environment issue potentially connected to high
load because the problem is not observed on other AIX machines.

This patch separates the "hard" timeout value from the value used to
signal a hang. This allows for a more generous "hard" timeout value,
which allows observation of cases that take longer to finish despite not
hanging.
Most bits of this are straightforward: when we emit SVE instructions in
the prologue/epilogue, emit corresponding opcodes.

The unfortunately nasty bit is the handling of the frame pointer in
functions that use the SVE calling convention. If we have SVE callee
saves, and need to restore the stack pointer from the frame pointer,
it's impossible to encode callee saves that happen after the frame
pointer. So this patch rearranges the stack to put SVE callee saves
first. This isn't really that complicated on its own, but it leads to a
lot of tricky conditionals (see FPAfterSVECalleeSaves).
…1916)

Currently, libc++'s `<tuple>` is using the deprecated
`__reference_binds_to_temporary` intrinsic. This PR starts to use
`__reference_constructs_from_temporary` if possible.

It seems that `__reference_constructs_from_temporary` should be used via
an internal type traits provided in
`<__type_traits/reference_constructs_from_temporary.h>`. But given the
old intrinsic was directly used, this PR doesn't switch to the current
convention yet.

P2255R2 is related. Although the paper indicated that constructors of
`tuple` should be deleted in such a case.

---------

Co-authored-by: Nikolas Klauser <[email protected]>
At the time `__reference_constructs_from_temporary` got implemented,
`__reference_binds_to_temporary` was mentioned as deprecated in
`LanguageExtensions.rst`, but no deprecation warning was emitted. This
PR adds the previously missing warning.
iclsrc and others added 7 commits June 4, 2025 14:37
Check if an array type exists in TypeMap before inserting it to prevent
duplications.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@2a5a4e71d79d074
The SPIR-V reader would crash on the added test during reverse
translation. Ensure any cooperative matrix type conversion is handled
the same way as a cooperative matrix type conversion with an
FPRoundingMode or SaturatedConversion decoration.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@32738f937103712
… unsigned (#3188)

Per SPIR-V OpenCL.ExtendedInstructionSet spec, the num_elements arg must
be size_t. So this PR changes its type to unsigned in SPV-IR.
Adding const to pointer arg aligns with OpenCL spec and allows const
pointer input.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@96f5ade29c9088e
@iclsrc iclsrc added the disable-lint Skip linter check step and proceed with build jobs label Jun 17, 2025
@jsji jsji force-pushed the llvmspirv_pulldown branch from 894d1b7 to 4316364 Compare June 17, 2025 20:46
@jsji jsji force-pushed the llvmspirv_pulldown branch from 4316364 to 539d96b Compare June 17, 2025 22:29
mdtoguchi and others added 3 commits June 17, 2025 20:00
The optimization level setting to the `-cc1` call has been updated to be
    earlier in the line.  Adjust the tests to match.
__builtin_spirv_generic_cast_to_ptr_explicit was added upstreaming but
because it has limited SYCL support, tests can't run properly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disable-lint Skip linter check step and proceed with build jobs
Projects
None yet
Development

Successfully merging this pull request may close these issues.