-
Notifications
You must be signed in to change notification settings - Fork 610
Makes use_fast_accum
configurable.
#3829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request was exported from Phabricator. Differential Revision: D71290596 |
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Summary: X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Differential Revision: D71290596
af74814
to
83859ec
Compare
Summary: X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Differential Revision: D71290596
83859ec
to
378e98d
Compare
This pull request was exported from Phabricator. Differential Revision: D71290596 |
Summary: Pull Request resolved: pytorch#3829 X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Differential Revision: D71290596
378e98d
to
f55a6ef
Compare
This pull request was exported from Phabricator. Differential Revision: D71290596 |
Summary: Pull Request resolved: pytorch#3829 X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Differential Revision: D71290596
f55a6ef
to
cd135b9
Compare
Summary: X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Differential Revision: D71290596
cd135b9
to
ca943ac
Compare
Summary: X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Differential Revision: D71290596
ca943ac
to
5eef915
Compare
This pull request was exported from Phabricator. Differential Revision: D71290596 |
Summary: Pull Request resolved: pytorch#3829 X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Differential Revision: D71290596
5eef915
to
eaa7e90
Compare
This pull request was exported from Phabricator. Differential Revision: D71290596 |
Summary: Pull Request resolved: pytorch#3829 X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Differential Revision: D71290596
dc8074c
to
58ba3e8
Compare
Summary: X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Reviewed By: htyu Differential Revision: D71290596
Summary: X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Reviewed By: htyu Differential Revision: D71290596
58ba3e8
to
ab6d092
Compare
This pull request was exported from Phabricator. Differential Revision: D71290596 |
Summary: Pull Request resolved: pytorch#3829 X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Reviewed By: htyu Differential Revision: D71290596
ab6d092
to
1f12df4
Compare
Summary: Pull Request resolved: pytorch#3829 X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Reviewed By: htyu Differential Revision: D71290596
This pull request was exported from Phabricator. Differential Revision: D71290596 |
1f12df4
to
c848446
Compare
This pull request has been merged in 067b63c. |
Summary: Pull Request resolved: pytorch#3829 X-link: https://github.com/facebookresearch/FBGEMM/pull/913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Reviewed By: htyu Differential Revision: D71290596 fbshipit-source-id: 8e2a20899f301f861d8d72f6290e573e23288e63
Summary: X-link: pytorch#3829 Pull Request resolved: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Reviewed By: htyu Differential Revision: D71290596 fbshipit-source-id: 8e2a20899f301f861d8d72f6290e573e23288e63
Summary:
[Public to OSS]
Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia!
use_fast_accum=True
.Differential Revision: D71290596