reduce overhead for f8f8bf16_rowwise_grouped_dynamic on amd #3742

mxz297 · 2025-02-27T16:16:23Z

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/823

When there is no need to zeroing output tensor, the argument setup kernel currently will launch many wasted thread blocks, and that can cause significant overhead. So we separate argument setup kernels into two kernels based on whether we need zeroing or not.

Differential Revision: D70327636

Summary: X-link: facebookresearch/FBGEMM#823 When there is no need to zeroing output tensor, the argument setup kernel currently will launch many wasted thread blocks, and that can cause significant overhead. So we separate argument setup kernels into two kernels based on whether we need zeroing or not. Differential Revision: D70327636

facebook-github-bot · 2025-02-27T16:16:36Z

This pull request was exported from Phabricator. Differential Revision: D70327636

netlify · 2025-02-27T16:16:50Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`bb1f280`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67c08fda425c2a00085f5976
😎 Deploy Preview	https://deploy-preview-3742--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot · 2025-02-27T19:34:59Z

This pull request has been merged in eee973c.

) Summary: X-link: pytorch#3742 Pull Request resolved: facebookresearch/FBGEMM#823 When there is no need to zeroing output tensor, the argument setup kernel currently will launch many wasted thread blocks, and that can cause significant overhead. So we separate argument setup kernels into two kernels based on whether we need zeroing or not. Reviewed By: zjing14, jwfromm Differential Revision: D70327636 fbshipit-source-id: c68bc094972929ccf9773e31f9b8a362dc5037d3

facebook-github-bot added the cla signed label Feb 27, 2025

facebook-github-bot added the fb-exported label Feb 27, 2025

facebook-github-bot closed this in eee973c Feb 27, 2025

facebook-github-bot added the Merged label Feb 27, 2025

q10 added category:improvement feature:gemm labels Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

reduce overhead for f8f8bf16_rowwise_grouped_dynamic on amd #3742

reduce overhead for f8f8bf16_rowwise_grouped_dynamic on amd #3742

Uh oh!

mxz297 commented Feb 27, 2025

Uh oh!

facebook-github-bot commented Feb 27, 2025

Uh oh!

netlify bot commented Feb 27, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Feb 27, 2025

Uh oh!

Uh oh!

reduce overhead for f8f8bf16_rowwise_grouped_dynamic on amd #3742

reduce overhead for f8f8bf16_rowwise_grouped_dynamic on amd #3742

Uh oh!

Conversation

mxz297 commented Feb 27, 2025

Uh oh!

facebook-github-bot commented Feb 27, 2025

Uh oh!

netlify bot commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

facebook-github-bot commented Feb 27, 2025

Uh oh!

Uh oh!

netlify bot commented Feb 27, 2025 •

edited

Loading