Skip to content

Use unique cache locations in backward for pipeline prefetching #2164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sryap
Copy link
Contributor

@sryap sryap commented Nov 28, 2023

Summary:
When pipeline prefetching is enabled (prefetch_pipeline=True) for
EmbeddingLocation.MANAGED_CACHING, TBE has to update
lxu_cache_locations to ensure cache consistency. Prior to this
diff, TBE performs the full cache lookup when updating
lxu_cache_locations (i.e., looking up all indices although they are
duplicated). The time complexity of such the lookup grows with
the number of indices. Thus, the lookup can be expensive when the
number of indices is large. This diff addresses this problem by
looking up only the unique indices (which is normally much smaller
than the full indices). The number of unique indices tends to stay
more or less the same even when the total number of indices grows.
Thus, looking up only unique indices can reduce cost of cache lookup
effectively. The unique lxu_cache_locations are fed directly to TBE
backward to consume. Thus, there is no decompression cost.

Reviewed By: ehsanardestani, r-barnes

Differential Revision: D51488959

Copy link

netlify bot commented Nov 28, 2023

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit d4d8e25
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6597cbaeeeb8630008beeada
😎 Deploy Preview https://deploy-preview-2164--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51488959

sryap added a commit to sryap/FBGEMM that referenced this pull request Dec 14, 2023
…rch#2164)

Summary:

When pipeline prefetching is enabled (`prefetch_pipeline=True`) for
`EmbeddingLocation.MANAGED_CACHING`, TBE has to update
`lxu_cache_locations` to ensure cache consistency.  Prior to this
diff, TBE performs the full cache lookup when updating
`lxu_cache_locations` (i.e., looking up all indices although they are
duplicated).  The time complexity of such the lookup grows with
the number of indices.  Thus, the lookup can be expensive when the
number of indices is large.  This diff addresses this problem by
looking up only the unique indices (which is normally much smaller
than the full indices).  The number of unique indices tends to stay
more or less the same even when the total number of indices grows.
Thus, looking up only unique indices can reduce cost of cache lookup
effectively.  The unique `lxu_cache_locations` are fed directly to TBE
backward to consume.  Thus, there is no decompression cost.

Reviewed By: ehsanardestani, r-barnes

Differential Revision: D51488959
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51488959

2 similar comments
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51488959

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51488959

…rch#2164)

Summary:

When pipeline prefetching is enabled (`prefetch_pipeline=True`) for
`EmbeddingLocation.MANAGED_CACHING`, TBE has to update
`lxu_cache_locations` to ensure cache consistency.  Prior to this
diff, TBE performs the full cache lookup when updating
`lxu_cache_locations` (i.e., looking up all indices although they are
duplicated).  The time complexity of such the lookup grows with
the number of indices.  Thus, the lookup can be expensive when the
number of indices is large.  This diff addresses this problem by
looking up only the unique indices (which is normally much smaller
than the full indices).  The number of unique indices tends to stay
more or less the same even when the total number of indices grows.
Thus, looking up only unique indices can reduce cost of cache lookup
effectively.  The unique `lxu_cache_locations` are fed directly to TBE
backward to consume.  Thus, there is no decompression cost.

Reviewed By: ehsanardestani, r-barnes

Differential Revision: D51488959
@sryap sryap force-pushed the export-D51488959 branch from 5148659 to 1f831b0 Compare January 5, 2024 09:27
sryap added a commit to sryap/FBGEMM that referenced this pull request Jan 5, 2024
…rch#2164)

Summary:

When pipeline prefetching is enabled (`prefetch_pipeline=True`) for
`EmbeddingLocation.MANAGED_CACHING`, TBE has to update
`lxu_cache_locations` to ensure cache consistency.  Prior to this
diff, TBE performs the full cache lookup when updating
`lxu_cache_locations` (i.e., looking up all indices although they are
duplicated).  The time complexity of such the lookup grows with
the number of indices.  Thus, the lookup can be expensive when the
number of indices is large.  This diff addresses this problem by
looking up only the unique indices (which is normally much smaller
than the full indices).  The number of unique indices tends to stay
more or less the same even when the total number of indices grows.
Thus, looking up only unique indices can reduce cost of cache lookup
effectively.  The unique `lxu_cache_locations` are fed directly to TBE
backward to consume.  Thus, there is no decompression cost.

Reviewed By: ehsanardestani, r-barnes

Differential Revision: D51488959
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51488959

@sryap sryap force-pushed the export-D51488959 branch from 1f831b0 to d4d8e25 Compare January 5, 2024 09:28
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51488959

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in e92c30b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants