Skip to content

Improve bounds_check_indices for VBE #3386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sryap
Copy link
Contributor

@sryap sryap commented Nov 18, 2024

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/475

Instead of over launching thread blocks, use b_t_map to launch only
necessary thread blocks to increase occupancy for the VBE case

Note that b_t_map is necessary for the TBE look for the VBE case. It
is generated during the TBE forward pass. In this diff, we call
generate_vbe_metdata twice (before bounds check and before forward
look up). These two calls can be fused into one. We will clean this
up in the subsequent diffs.

Differential Revision: D65735342

Summary:
X-link: facebookresearch/FBGEMM#475

Instead of over launching thread blocks, use `b_t_map` to launch only
necessary thread blocks to increase occupancy for the VBE case

Note that `b_t_map` is necessary for the TBE look for the VBE case. It
is generated during the TBE forward pass.  In this diff, we call
`generate_vbe_metdata` twice (before bounds check and before forward
look up).  These two calls can be fused into one.  We will clean this
up in the subsequent diffs.

Differential Revision: D65735342
Copy link

netlify bot commented Nov 18, 2024

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 3f14a4c
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/673add34ce011c000819a299
😎 Deploy Preview https://deploy-preview-3386--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65735342

sryap added a commit to sryap/FBGEMM that referenced this pull request Nov 18, 2024
Summary:

X-link: facebookresearch/FBGEMM#475

Instead of over launching thread blocks, use `b_t_map` to launch only
necessary thread blocks to increase occupancy for the VBE case

Note that `b_t_map` is necessary for the TBE look for the VBE case. It
is generated during the TBE forward pass.  In this diff, we call
`generate_vbe_metdata` twice (before bounds check and before forward
look up).  These two calls can be fused into one.  We will clean this
up in the subsequent diffs.

Differential Revision: D65735342
sryap added a commit to sryap/FBGEMM that referenced this pull request Nov 18, 2024
Summary:

X-link: facebookresearch/FBGEMM#475

Instead of over launching thread blocks, use `b_t_map` to launch only
necessary thread blocks to increase occupancy for the VBE case

Note that `b_t_map` is necessary for the TBE look for the VBE case. It
is generated during the TBE forward pass.  In this diff, we call
`generate_vbe_metdata` twice (before bounds check and before forward
look up).  These two calls can be fused into one.  We will clean this
up in the subsequent diffs.

Differential Revision: D65735342
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in dff9de7.

q10 pushed a commit to q10/FBGEMM that referenced this pull request Apr 10, 2025
Summary:
X-link: pytorch#3386

Pull Request resolved: facebookresearch/FBGEMM#475

Instead of over launching thread blocks, use `b_t_map` to launch only
necessary thread blocks to increase occupancy for the VBE case

Note that `b_t_map` is necessary for the TBE look for the VBE case. It
is generated during the TBE forward pass.  In this diff, we call
`generate_vbe_metdata` twice (before bounds check and before forward
look up).  These two calls can be fused into one.  We will clean this
up in the subsequent diffs.

Reviewed By: Fiery

Differential Revision: D65735342

fbshipit-source-id: 728dcd32e425c4b4a00ca4796aa324811c8812c3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants