Makes `use_fast_accum` configurable. #3829

levendlee · 2025-03-17T03:18:08Z

Summary:
[Public to OSS]

Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia!

Exposes fast accumulation as a configurable.
Not enable it by default. No change in default behavior.
No additional tuning regarding to use_fast_accum=True.

Differential Revision: D71290596

facebook-github-bot · 2025-03-17T03:18:20Z

This pull request was exported from Phabricator. Differential Revision: D71290596

netlify · 2025-03-17T03:18:28Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`c848446`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67d854628c9e5b0008bf35be
😎 Deploy Preview	https://deploy-preview-3829--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Summary: X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Differential Revision: D71290596

facebook-github-bot · 2025-03-17T03:55:36Z

This pull request was exported from Phabricator. Differential Revision: D71290596

Summary: Pull Request resolved: pytorch#3829 X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Differential Revision: D71290596

facebook-github-bot · 2025-03-17T04:05:55Z

This pull request was exported from Phabricator. Differential Revision: D71290596

Summary: Pull Request resolved: pytorch#3829 X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Differential Revision: D71290596

Summary: X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Differential Revision: D71290596

facebook-github-bot · 2025-03-17T04:20:06Z

This pull request was exported from Phabricator. Differential Revision: D71290596

Summary: Pull Request resolved: pytorch#3829 X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Differential Revision: D71290596

facebook-github-bot · 2025-03-17T04:27:01Z

This pull request was exported from Phabricator. Differential Revision: D71290596

Summary: Pull Request resolved: pytorch#3829 X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Differential Revision: D71290596

Summary: X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Reviewed By: htyu Differential Revision: D71290596

facebook-github-bot · 2025-03-17T16:49:34Z

This pull request was exported from Phabricator. Differential Revision: D71290596

Summary: Pull Request resolved: pytorch#3829 X-link: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Reviewed By: htyu Differential Revision: D71290596

facebook-github-bot · 2025-03-17T16:57:01Z

This pull request was exported from Phabricator. Differential Revision: D71290596

facebook-github-bot · 2025-03-17T22:06:34Z

This pull request has been merged in 067b63c.

Summary: Pull Request resolved: pytorch#3829 X-link: https://github.com/facebookresearch/FBGEMM/pull/913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Reviewed By: htyu Differential Revision: D71290596 fbshipit-source-id: 8e2a20899f301f861d8d72f6290e573e23288e63

Summary: X-link: pytorch#3829 Pull Request resolved: facebookresearch/FBGEMM#913 [Public to OSS] Thanks htyu for pointing out the issue. Looking forward to warp specialization support on Nvidia! - Exposes fast accumulation as a configurable. - Not enable it by default. No change in default behavior. - No additional tuning regarding to `use_fast_accum=True`. W/ HIP backend, the semantics of `c += tl.dot(a, b)` and `c = tl.dot(a,b,c)` seems to be the same. Reviewed By: htyu Differential Revision: D71290596 fbshipit-source-id: 8e2a20899f301f861d8d72f6290e573e23288e63

facebook-github-bot added the cla signed label Mar 17, 2025

facebook-github-bot added the fb-exported label Mar 17, 2025

levendlee force-pushed the export-D71290596 branch from af74814 to 83859ec Compare March 17, 2025 03:52

levendlee force-pushed the export-D71290596 branch from 83859ec to 378e98d Compare March 17, 2025 03:53

levendlee force-pushed the export-D71290596 branch from 378e98d to f55a6ef Compare March 17, 2025 03:56

levendlee force-pushed the export-D71290596 branch from f55a6ef to cd135b9 Compare March 17, 2025 04:05

levendlee force-pushed the export-D71290596 branch from cd135b9 to ca943ac Compare March 17, 2025 04:13

levendlee force-pushed the export-D71290596 branch from ca943ac to 5eef915 Compare March 17, 2025 04:14

levendlee force-pushed the export-D71290596 branch from 5eef915 to eaa7e90 Compare March 17, 2025 04:20

levendlee force-pushed the export-D71290596 branch 2 times, most recently from dc8074c to 58ba3e8 Compare March 17, 2025 16:43

levendlee force-pushed the export-D71290596 branch from 58ba3e8 to ab6d092 Compare March 17, 2025 16:43

levendlee force-pushed the export-D71290596 branch from ab6d092 to 1f12df4 Compare March 17, 2025 16:49

levendlee force-pushed the export-D71290596 branch from 1f12df4 to c848446 Compare March 17, 2025 16:57

facebook-github-bot closed this in 067b63c Mar 17, 2025

facebook-github-bot added the Merged label Mar 17, 2025

q10 added category:improvement feature:triton labels Mar 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Makes `use_fast_accum` configurable. #3829

Makes `use_fast_accum` configurable. #3829

Uh oh!

levendlee commented Mar 17, 2025

Uh oh!

facebook-github-bot commented Mar 17, 2025

Uh oh!

netlify bot commented Mar 17, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Mar 17, 2025

Uh oh!

facebook-github-bot commented Mar 17, 2025

Uh oh!

facebook-github-bot commented Mar 17, 2025

Uh oh!

facebook-github-bot commented Mar 17, 2025

Uh oh!

facebook-github-bot commented Mar 17, 2025

Uh oh!

facebook-github-bot commented Mar 17, 2025

Uh oh!

facebook-github-bot commented Mar 17, 2025

Uh oh!

Uh oh!

Makes use_fast_accum configurable. #3829

Makes use_fast_accum configurable. #3829

Uh oh!

Conversation

levendlee commented Mar 17, 2025

Uh oh!

facebook-github-bot commented Mar 17, 2025

Uh oh!

netlify bot commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

facebook-github-bot commented Mar 17, 2025

Uh oh!

facebook-github-bot commented Mar 17, 2025

Uh oh!

facebook-github-bot commented Mar 17, 2025

Uh oh!

facebook-github-bot commented Mar 17, 2025

Uh oh!

facebook-github-bot commented Mar 17, 2025

Uh oh!

facebook-github-bot commented Mar 17, 2025

Uh oh!

facebook-github-bot commented Mar 17, 2025

Uh oh!

Uh oh!

Makes `use_fast_accum` configurable. #3829

Makes `use_fast_accum` configurable. #3829

netlify bot commented Mar 17, 2025 •

edited

Loading