You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add Preshuffled FP8 x INT4 Grouped Gemm Kernel (pytorch#3800)
Summary:
X-link: facebookresearch/FBGEMM#897
Efficient FP8xINT4 grouped gemm with preshuffling and scale packing. This implementation uses the "stacked" API where inputs and outputs are single contiguous tensors and the group boundaries are indicated with an `M_sizes` tensor that contains the number of rows in each group.
Reviewed By: jiawenliu64
Differential Revision: D70870933
0 commit comments