-
Notifications
You must be signed in to change notification settings - Fork 610
Eliminate MemCpyDtoH overhead for quantized fast_gemv kernel #3725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request was exported from Phabricator. Differential Revision: D70072967 |
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
…#3725) Summary: X-link: facebookresearch/FBGEMM#808 As title. Previously `w_scale.item()` will create unnecessary DeviceToHost memcpy operations thus introduces big gaps in between each gemv kernel launch call {F1975359276} Fixed in this diff and other minor updates. Differential Revision: D70072967
…#3725) Summary: X-link: facebookresearch/FBGEMM#808 As title. Previously `w_scale.item()` will create unnecessary DeviceToHost memcpy operations thus introduces big gaps in between each gemv kernel launch call {F1975359276} Fixed in this diff and other minor updates. Differential Revision: D70072967
25b2d7a
to
74981f7
Compare
This pull request was exported from Phabricator. Differential Revision: D70072967 |
74981f7
to
25b2d7a
Compare
This pull request was exported from Phabricator. Differential Revision: D70072967 |
…#3725) Summary: X-link: facebookresearch/FBGEMM#808 As title. Previously `w_scale.item()` will create unnecessary DeviceToHost memcpy operations thus introduces big gaps in between each gemv kernel launch call {F1975359276} Fixed in this diff and other minor updates. Differential Revision: D70072967
25b2d7a
to
2ec1051
Compare
This pull request was exported from Phabricator. Differential Revision: D70072967 |
This pull request has been merged in 1529564. |
…#808) Summary: X-link: pytorch#3725 Pull Request resolved: facebookresearch/FBGEMM#808 As title. Previously `w_scale.item()` will create unnecessary DeviceToHost memcpy operations thus introduces big gaps in between each gemv kernel launch call {F1975359276} Fixed in this diff and other minor updates. Reviewed By: ipiszy Differential Revision: D70072967 fbshipit-source-id: f71c838943ea45bc07041ba64426c13e995ff93d
Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/808
As title.
Previously
w_scale.item()
will create unnecessary DeviceToHost memcpy operations thus introduces big gaps in between each gemv kernel launch call{F1975359276}
Fixed in this diff and other minor updates.
Differential Revision: D70072967