Fix IMA in TBE grad indices kernel for int32 indices #3877

sryap · 2025-03-25T08:12:31Z

Differential Revision: D71796826

facebook-github-bot · 2025-03-25T08:12:43Z

This pull request was exported from Phabricator. Differential Revision: D71796826

netlify · 2025-03-25T08:12:49Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`4068c0e`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67e332d8c70aaf0008f8ac18
😎 Deploy Preview	https://deploy-preview-3877--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Summary: This diff forces casting the weight tensor index to `overflow_safe_int_t` whichi is int64_t to address the 32-bit int overflow issue when the TBE `indices` is int32_t. Before this diff, `idx_j` and `D_emb` are both int32_t when using 32-bit int `indices`. When accessing the embedding `weights` tensor, we computed an index by multiplying `idx_j` and `D_emb` together. Their product could be larger than the max value of int32_t. This led to an integer overflow problem, resulting in illegal memory access. By forcing the `idx_j` and `D_emb` to be int64_t, we can prevent the 32-bit int overflow problem. Note that using int64_t is safe since its max value is much larger than the memory sizes of modern GPUs and CPUs. We also add unit tests for this issue. X-link: facebookresearch/FBGEMM#967 **Facebook:** This is the fix for S498528. The full root cause details are in https://fburl.com/gdoc/lhmvenw3. Reviewed By: brad-mengchi, spcyppt Differential Revision: D71796826

facebook-github-bot · 2025-03-25T22:49:05Z

This pull request was exported from Phabricator. Differential Revision: D71796826

facebook-github-bot · 2025-03-26T17:40:32Z

This pull request has been merged in 6a6db7c.

Summary: This diff forces casting the weight tensor index to `overflow_safe_int_t` whichi is int64_t to address the 32-bit int overflow issue when the TBE `indices` is int32_t. Before this diff, `idx_j` and `D_emb` are both int32_t when using 32-bit int `indices`. When accessing the embedding `weights` tensor, we computed an index by multiplying `idx_j` and `D_emb` together. Their product could be larger than the max value of int32_t. This led to an integer overflow problem, resulting in illegal memory access. By forcing the `idx_j` and `D_emb` to be int64_t, we can prevent the 32-bit int overflow problem. Note that using int64_t is safe since its max value is much larger than the memory sizes of modern GPUs and CPUs. We also add unit tests for this issue. X-link: pytorch#3877 Pull Request resolved: facebookresearch/FBGEMM#967 **Facebook:** This is the fix for S498528. The full root cause details are in https://fburl.com/gdoc/lhmvenw3. Reviewed By: brad-mengchi, spcyppt Differential Revision: D71796826 fbshipit-source-id: df7c8e06d36e2f06585a5812df8ae40863ea6253

facebook-github-bot added the cla signed label Mar 25, 2025

facebook-github-bot added the fb-exported label Mar 25, 2025

sryap force-pushed the export-D71796826 branch from 5ab4269 to 4068c0e Compare March 25, 2025 22:48

facebook-github-bot closed this in 6a6db7c Mar 26, 2025

facebook-github-bot added the Merged label Mar 26, 2025

q10 added category:fix feature:tbe labels Apr 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix IMA in TBE grad indices kernel for int32 indices #3877

Fix IMA in TBE grad indices kernel for int32 indices #3877

Uh oh!

sryap commented Mar 25, 2025

Uh oh!

facebook-github-bot commented Mar 25, 2025

Uh oh!

netlify bot commented Mar 25, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Mar 25, 2025

Uh oh!

facebook-github-bot commented Mar 26, 2025

Uh oh!

Uh oh!

Fix IMA in TBE grad indices kernel for int32 indices #3877

Fix IMA in TBE grad indices kernel for int32 indices #3877

Uh oh!

Conversation

sryap commented Mar 25, 2025

Uh oh!

facebook-github-bot commented Mar 25, 2025

Uh oh!

netlify bot commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

facebook-github-bot commented Mar 25, 2025

Uh oh!

facebook-github-bot commented Mar 26, 2025

Uh oh!

Uh oh!

netlify bot commented Mar 25, 2025 •

edited

Loading