You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Pull Request resolved: pytorch#3829
X-link: https://github.com/facebookresearch/FBGEMM/pull/913
[Public to OSS]
Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia!
- Exposes fast accumulation as a configurable.
- Not enable it by default. No change in default behavior.
- No additional tuning regarding to `use_fast_accum=True`.
W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same.
Reviewed By: htyu
Differential Revision: D71290596
fbshipit-source-id: 8e2a20899f301f861d8d72f6290e573e23288e63
0 commit comments