Implement generate_vbe_metadata cpu #3715

spcyppt · 2025-02-19T21:12:58Z

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/796

This diff implements generate_vbe_metadata for cpu, such that the function returns the same output for CPU, CUDA and MTIA.

To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++.

Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations.

VBE CPU tests are in the next diff.

Reviewed By: sryap

Differential Revision: D69162870

facebook-github-bot · 2025-02-19T21:13:07Z

This pull request was exported from Phabricator. Differential Revision: D69162870

netlify · 2025-02-19T21:13:17Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`3892615`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67c771523b8b030008f01eab
😎 Deploy Preview	https://deploy-preview-3715--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Summary: X-link: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Differential Revision: D69162870

facebook-github-bot · 2025-02-21T00:13:42Z

This pull request was exported from Phabricator. Differential Revision: D69162870

facebook-github-bot · 2025-02-21T23:46:31Z

This pull request was exported from Phabricator. Differential Revision: D69162870

Summary: Pull Request resolved: pytorch#3715 X-link: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Differential Revision: D69162870

facebook-github-bot · 2025-02-21T23:54:37Z

This pull request was exported from Phabricator. Differential Revision: D69162870

Summary: Pull Request resolved: pytorch#3715 X-link: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Differential Revision: D69162870

Summary: X-link: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Differential Revision: D69162870

Summary: X-link: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Reviewed By: sryap Differential Revision: D69162870

facebook-github-bot · 2025-03-03T22:42:16Z

This pull request was exported from Phabricator. Differential Revision: D69162870

Summary: X-link: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Reviewed By: sryap Differential Revision: D69162870

facebook-github-bot · 2025-03-04T04:29:44Z

This pull request was exported from Phabricator. Differential Revision: D69162870

Summary: X-link: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Reviewed By: sryap Differential Revision: D69162870

facebook-github-bot · 2025-03-04T04:38:01Z

This pull request was exported from Phabricator. Differential Revision: D69162870

facebook-github-bot · 2025-03-04T04:40:13Z

This pull request was exported from Phabricator. Differential Revision: D69162870

Summary: Pull Request resolved: pytorch#3715 X-link: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Reviewed By: sryap Differential Revision: D69162870

Summary: X-link: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Reviewed By: sryap, nautsimon Differential Revision: D69162870

facebook-github-bot · 2025-03-04T06:35:13Z

This pull request was exported from Phabricator. Differential Revision: D69162870

Summary: Pull Request resolved: pytorch#3715 X-link: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Reviewed By: sryap, nautsimon Differential Revision: D69162870

facebook-github-bot · 2025-03-04T06:47:25Z

This pull request was exported from Phabricator. Differential Revision: D69162870

Summary: Pull Request resolved: pytorch#3715 X-link: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Reviewed By: sryap, nautsimon Differential Revision: D69162870

Summary: X-link: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Reviewed By: sryap, nautsimon Differential Revision: D69162870

facebook-github-bot · 2025-03-04T21:19:47Z

This pull request was exported from Phabricator. Differential Revision: D69162870

Summary: Pull Request resolved: pytorch#3715 X-link: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Reviewed By: sryap, nautsimon Differential Revision: D69162870

facebook-github-bot · 2025-03-04T21:31:57Z

This pull request was exported from Phabricator. Differential Revision: D69162870

Summary: X-link: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Reviewed By: sryap, nautsimon Differential Revision: D69162870

Summary: Pull Request resolved: pytorch#3715 X-link: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Reviewed By: sryap, nautsimon Differential Revision: D69162870

facebook-github-bot · 2025-03-05T21:42:30Z

This pull request has been merged in f0ff8bb.

Summary: X-link: pytorch#3715 Pull Request resolved: facebookresearch/FBGEMM#796 This diff implements `generate_vbe_metadata` for cpu, such that the function returns the same output for CPU, CUDA and MTIA. To support VBE on CPU with existing fixed-batch-size CPU kernel, we need to recompute offsets, which is previously done in python. This diff implements offsets recomputation in C++ such that all manipulations are done in C++. Note that reshaping offsets and grad_input to work with existing fixed-batch-size CPU kernels are done in Autograd instead of wrapper to avoid multiple computations. VBE CPU tests are in the next diff. Reviewed By: sryap, nautsimon Differential Revision: D69162870 fbshipit-source-id: 08c6e45b8f0d319b96371acaba0d9a27570a1bd7

facebook-github-bot added the cla signed label Feb 19, 2025

facebook-github-bot added the fb-exported label Feb 19, 2025

spcyppt force-pushed the export-D69162870 branch from 7529dfe to b2d0bcd Compare February 21, 2025 00:13

spcyppt force-pushed the export-D69162870 branch from b2d0bcd to aac9690 Compare February 21, 2025 23:46

spcyppt force-pushed the export-D69162870 branch from aac9690 to ae43025 Compare February 21, 2025 23:54

spcyppt force-pushed the export-D69162870 branch from ae43025 to 90727e6 Compare February 26, 2025 04:31

spcyppt force-pushed the export-D69162870 branch 2 times, most recently from 3a59542 to 104fc25 Compare March 3, 2025 22:42

spcyppt force-pushed the export-D69162870 branch from 104fc25 to ea6a843 Compare March 4, 2025 04:29

spcyppt force-pushed the export-D69162870 branch from ea6a843 to 5ab0b9b Compare March 4, 2025 04:36

spcyppt force-pushed the export-D69162870 branch from 5ab0b9b to bb38a62 Compare March 4, 2025 04:37

spcyppt force-pushed the export-D69162870 branch from f990938 to cd6d50b Compare March 4, 2025 06:31

spcyppt force-pushed the export-D69162870 branch from cd6d50b to 791a482 Compare March 4, 2025 06:32

spcyppt force-pushed the export-D69162870 branch from 791a482 to 7adeb1c Compare March 4, 2025 06:35

spcyppt force-pushed the export-D69162870 branch from 7adeb1c to be33b5b Compare March 4, 2025 06:47

spcyppt force-pushed the export-D69162870 branch from be33b5b to 4b55c11 Compare March 4, 2025 21:14

spcyppt force-pushed the export-D69162870 branch from 4b55c11 to 9222538 Compare March 4, 2025 21:15

spcyppt force-pushed the export-D69162870 branch from 9222538 to 0f37ada Compare March 4, 2025 21:19

spcyppt force-pushed the export-D69162870 branch from 0f37ada to 3892615 Compare March 4, 2025 21:32

facebook-github-bot closed this in f0ff8bb Mar 5, 2025

facebook-github-bot added the Merged label Mar 5, 2025

q10 added category:improvement feature:tbe labels Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement generate_vbe_metadata cpu #3715

Implement generate_vbe_metadata cpu #3715

Uh oh!

spcyppt commented Feb 19, 2025

Uh oh!

facebook-github-bot commented Feb 19, 2025

Uh oh!

netlify bot commented Feb 19, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Feb 21, 2025

Uh oh!

facebook-github-bot commented Feb 21, 2025

Uh oh!

facebook-github-bot commented Feb 21, 2025

Uh oh!

facebook-github-bot commented Mar 3, 2025

Uh oh!

facebook-github-bot commented Mar 4, 2025

Uh oh!

facebook-github-bot commented Mar 4, 2025

Uh oh!

facebook-github-bot commented Mar 4, 2025

Uh oh!

facebook-github-bot commented Mar 4, 2025

Uh oh!

facebook-github-bot commented Mar 4, 2025

Uh oh!

facebook-github-bot commented Mar 4, 2025

Uh oh!

facebook-github-bot commented Mar 4, 2025

Uh oh!

facebook-github-bot commented Mar 5, 2025

Uh oh!

Uh oh!

Implement generate_vbe_metadata cpu #3715

Implement generate_vbe_metadata cpu #3715

Uh oh!

Conversation

spcyppt commented Feb 19, 2025

Uh oh!

facebook-github-bot commented Feb 19, 2025

Uh oh!

netlify bot commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

facebook-github-bot commented Feb 21, 2025

Uh oh!

facebook-github-bot commented Feb 21, 2025

Uh oh!

facebook-github-bot commented Feb 21, 2025

Uh oh!

facebook-github-bot commented Mar 3, 2025

Uh oh!

facebook-github-bot commented Mar 4, 2025

Uh oh!

facebook-github-bot commented Mar 4, 2025

Uh oh!

facebook-github-bot commented Mar 4, 2025

Uh oh!

facebook-github-bot commented Mar 4, 2025

Uh oh!

facebook-github-bot commented Mar 4, 2025

Uh oh!

facebook-github-bot commented Mar 4, 2025

Uh oh!

facebook-github-bot commented Mar 4, 2025

Uh oh!

facebook-github-bot commented Mar 5, 2025

Uh oh!

Uh oh!

netlify bot commented Feb 19, 2025 •

edited

Loading