-
Notifications
You must be signed in to change notification settings - Fork 610
Add VBE to Dense TBE frontend #2628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Pull Request resolved: pytorch#2596 Prior to this diff, SSD TBE lacked support for the conflict cache miss scenario. It operated under the assumption that the cache, located in GPU memory, was sufficiently large to hold all prefetched data from SSD. In the event of a conflict cache miss, the behavior of SSD TBE would be unpredictable (it could either fail or potentially access illegal memory). Note that a conflict cache miss happens when an embedding row is absent in the cache, and after being fetched from SSD, it cannot be inserted into the cache due to capacity constraints or associativity limitations. This diff introduces support for conflict cache misses by storing rows that cannot be inserted into the cache due to conflicts in a scratch pad, which is a temporary GPU tensor. In the case where rows are missed from the cache, TBE kernels can access the scratch pad. Prior to this diff, during the SSD prefetch stage, any row that was missed the cache and required fetching from SSD would be first fetched into a CPU scratch pad and then transferred to GPU. Rows that could be inserted into the cache would subsequently be copied from the GPU scratch pad into the cache. If conflict misses occurred, the prefetch behavior would be unpredictable. With this diff, conflict missed rows are now retained in the scratch pad, which is kept alive until the current iteration completes. Throughout the forward and backward + optimizer stages of TBE, both the cache and scratch pad are equivalent in terms of usage. However, following the completion of the backward + optimizer step, rows in the scratch pad are flushed back to SSD, unlike rows residing in the cache which are not evicted for future usage (see the diagram below for more details). {F1645878181} Differential Revision: D55998215
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
This pull request was exported from Phabricator. Differential Revision: D56651380 |
This pull request was exported from Phabricator. Differential Revision: D56651380 |
Summary: Pull Request resolved: pytorch#2628 - add frontend support to Dense TBE module to support VBE - add unit test Differential Revision: D56651380
b33c8bb
to
86f8974
Compare
This pull request was exported from Phabricator. Differential Revision: D56651380 |
86f8974
to
b31daed
Compare
Summary: Pull Request resolved: pytorch#2628 - add frontend support to Dense TBE module to support VBE - add unit test Differential Revision: D56651380
Summary: Pull Request resolved: pytorch#2620 - make the dense TBE headers into a template - add VBE options to the dense TBE header Differential Revision: https://internalfb.com/D57017981
This pull request was exported from Phabricator. Differential Revision: D56651380 |
Summary: Pull Request resolved: pytorch#2628 - add frontend support to Dense TBE module to support VBE - add unit test Differential Revision: D56651380
b31daed
to
e048d0a
Compare
This pull request was exported from Phabricator. Differential Revision: D56651380 |
Summary: Pull Request resolved: pytorch#2628 - add frontend support to Dense TBE module to support VBE - add unit test Differential Revision: D56651380
e048d0a
to
f95d940
Compare
Summary: Pull Request resolved: pytorch#2628 - add frontend support to Dense TBE module to support VBE - add unit test Differential Revision: D56651380
This pull request was exported from Phabricator. Differential Revision: D56651380 |
f95d940
to
65e3266
Compare
This pull request has been merged in d50babd. |
Summary:
Differential Revision: D56651380