Skip to content

simveit/effective_transpose

Repository files navigation

Effective matrix transpose

Improve matrix transpose step by step.

Please see my blogpost for a detailed explanation.

Performance Comparison

Kernel Bandwidth (GB/s) % of Max Bandwidth Implementation
transpose_naive 875.46 26.5291% Custom
transpose_swizzle 1251.76 37.9323% Custom
transpose_swizzle_batched 2771.35 83.9804% Custom

About

Effective transpose on Hopper GPU

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published