Skip to content

Reduce registers in bounds_check_indices #3298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sryap
Copy link
Contributor

@sryap sryap commented Oct 31, 2024

Summary:
Reduce the number of registers per thread in bounds_check_indices to
increase occupancy by:

  • Passing bounds_check_mode as a kernel template arg
  • Moving printf for indices checking outside of the for-loop for
    BoundsCheckMode::WARNING
  • Moving the last offset check outside of the for-loop

Differential Revision: D65071179

Copy link

netlify bot commented Oct 31, 2024

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 143ac0d
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67252ff47408fe0008261940
😎 Deploy Preview https://deploy-preview-3298--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65071179

Summary:
X-link: facebookresearch/FBGEMM#397


Reduce the number of registers per thread in `bounds_check_indices` to
increase occupancy by:
- Passing `bounds_check_mode` as a kernel template arg
- Moving `printf` for indices checking outside of the for-loop for
  `BoundsCheckMode::WARNING`
- Moving the last offset check outside of the for-loop

Reviewed By: q10, Fiery

Differential Revision: D65071179
sryap added a commit to sryap/FBGEMM that referenced this pull request Nov 1, 2024
Summary:
X-link: facebookresearch/FBGEMM#397


Reduce the number of registers per thread in `bounds_check_indices` to
increase occupancy by:
- Passing `bounds_check_mode` as a kernel template arg
- Moving `printf` for indices checking outside of the for-loop for
  `BoundsCheckMode::WARNING`
- Moving the last offset check outside of the for-loop

Reviewed By: q10, Fiery

Differential Revision: D65071179
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65071179

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 21e86af.

q10 pushed a commit to q10/FBGEMM that referenced this pull request Apr 10, 2025
Summary:
Pull Request resolved: facebookresearch/FBGEMM#397

X-link: pytorch#3298

Reduce the number of registers per thread in `bounds_check_indices` to
increase occupancy by:
- Passing `bounds_check_mode` as a kernel template arg
- Moving `printf` for indices checking outside of the for-loop for
  `BoundsCheckMode::WARNING`
- Moving the last offset check outside of the for-loop

Reviewed By: q10, Fiery

Differential Revision: D65071179

fbshipit-source-id: 6185ff35dac6cbf50fb2c7ec172ac82ed3b0c746
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants