Skip to content

Release v3.5.0

Latest
Compare
Choose a tag to compare
@ScottTodd ScottTodd released this 11 Jun 15:14
· 176 commits to main since this release
v3.5.0
2e8738e

Notable changes

Compiler

  • Added support for AMD Radeon 9060XT and Radeon AI PRO R9700 GPUs #21035.
  • Defined the new gfx950 target within AMDGPU support, incorporating several novel MFMAs (Matrix-Fused Multiply-Add operations) #20623.
  • Introduced CombineLayoutTransformation to consolidate transpose, reshape, and slice into a single iree_linalg_ext.map_scatter operation #20655.
  • Supported medium-sized expanded-shape FP8 in the pingpong strategy #20735. And removed dynamic M bounds checks for 'pingpong' strategies in AMDGPU support #20738.
  • Enhanced CombineLayoutTransformation to support folding tensor.pad operations into map_scatter operations #20797.
  • Constrained GPUAllocPrivateMemoryForDPSOps pass to pure tensor semantics, addressing recent implementation issues #20939.
  • The handling capability of the SpecializeEncodings pass to accommodate pad-based encodings was enhanced, allowing non-serializable encodings to be converted into serializable forms #20845.
  • A refinement was made to prevent the hoisting of set_encoding and unset_encoding operations related to padding encodings #20733. Additionally, the dispatch creation mechanism has been updated to utilize patterns that facilitate the bubbling up of expand_shape operations across collapse_shape operations #20648. The attention operation was optimized by removing unit dimensions from the mask operand #20796 and new logic was introduced to impose limits on the application of padding encoding during dispatch creation #20732.
  • Added support for the F8E8M0FNU type, ensuring validity as a HAL element type #20783. Expanded the range of valid HAL element types to include various scaled MFMA types, specifically f8E8M0FNU, f6E3M2FN, and f4E2M1FN.
  • Linalg Extensions Dialect improvements (#20688,#20728,#20747,#20776, #19719, #20827, #20863 , #20568, #20916)

Runtime

  • Introduced hal.executable.export condition regions, enhancing dispatch decision-making at each site based on device capabilities and workload parameters #20739.
  • Added user-defined IREE_ALLOCATOR_SYSTEM support #20727, providing the ability for external override of the allocator control function.
  • Extended support for mimalloc v3 as an optional system allocator #20730, enabling integration by setting -DIREE_ALLOCATOR_SYSTEM=mimalloc to statically link mimalloc into iree::base.
  • Enabled the import of external streams into HIP for scenarios requiring close integration with external applications #20972.
  • Heterogenous device support is under development, which will allow compiled programs to allocate buffers across compatible devices and synchronize operations via semaphores. This intial phase will support CPU-only configurations, aiming for seamless integration #20851.
  • The IREE PJRT plugin now supports memory related APIs, logging control and has been updated to the latest API version #20911.
  • An experimental #hal.device.optimal<...> affinity attribute for runtime-resolvable device affinities, initially focusing on allocation-related operations has been implemented #20879.

New Contributors

Full changelog

List of changes

Commit history: v3.4.0...v3.5.0