v0.2.2.post1
What's Changed
- bump version to v0.2.2 by @yzh119 in #891
- perf: fix the performance of second stage of split-k by @yzh119 in #894
- fix: pin_memory use cpu as default device by @KnowingNothing in #895
- perf: tweak register amount for producer/consumer in MLA template by @yzh119 in #896
- perf: fix MLA split-k performance bug by @yzh119 in #898
- perf: use f16 as split-k partial output data type by @yzh119 in #900
- perf: tweak the pipeline design of mla kernel by @yzh119 in #901
Full Changelog: v0.2.2...v0.2.2.post1