Change set_learning_rate_tensor #3945

spcyppt · 2025-04-08T06:45:28Z

Summary:
The change in D71010630 breaks aps_models/examples/dlrm/tutorials:test_kernel_apf_dlrm_with_basic_training_demo, which potentially breaks apf_dlrm bento kernel.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor []] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

See Full error log

TBE has a method to set learning rate, i.e., set_learning_rate(lr) where lr is the learning rate value to be set.

D71010630 removes optimizer_args.learning_rate (float) and introduces self.learning_rate_tensor (tensor). Hence setting learning rate value means changing the value of the leanring_rate_tensor. We changed this by using tensor._fill(lr).

However, this seems to break bento kernel which is built from APF code, which causes issues when an in-place operation occurs i.e., tensor._fill(lr).

The workaround is to create a new tensor to avoid the in-place operation. The change passes the test

https://www.internalfb.com/intern/testinfra/testconsole/testrun/3659174972188704/

Reviewed By: sryap, nautsimon

Differential Revision: D72617537

Summary: The change in D71010630 breaks `aps_models/examples/dlrm/tutorials:test_kernel_apf_dlrm_with_basic_training_demo`, which potentially breaks `apf_dlrm` bento kernel. ``` RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor []] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). ``` See [Full error log](https://www.internalfb.com/intern/everpaste/?handle=GHmNRBnyyiCXTVkCAP28ClWARWwPbswMAAAz) TBE has a method to set learning rate, i.e., `set_learning_rate(lr)` where `lr` is the learning rate value to be set. D71010630 removes `optimizer_args.learning_rate` (float) and introduces `self.learning_rate_tensor` (tensor). Hence setting learning rate value means changing the value of the `leanring_rate_tensor`. We changed this by using `tensor._fill(lr)`. However, this seems to break bento kernel which is built from APF code, which causes issues when an in-place operation occurs i.e., `tensor._fill(lr)`. The workaround is to create a new tensor to avoid the in-place operation. The change passes the test https://www.internalfb.com/intern/testinfra/testconsole/testrun/3659174972188704/ Reviewed By: sryap, nautsimon Differential Revision: D72617537

facebook-github-bot · 2025-04-08T06:45:37Z

This pull request was exported from Phabricator. Differential Revision: D72617537

netlify · 2025-04-08T06:45:47Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`1716b8e`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67f4c60b5c2de300082eadf7
😎 Deploy Preview	https://deploy-preview-3945--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot · 2025-04-08T17:57:20Z

This pull request has been merged in 7bb8db2.

Summary: Pull Request resolved: facebookresearch/FBGEMM#1030 X-link: pytorch#3945 The change in D71010630 breaks `aps_models/examples/dlrm/tutorials:test_kernel_apf_dlrm_with_basic_training_demo`, which potentially breaks `apf_dlrm` bento kernel. ``` RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor []] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). ``` See [Full error log](https://www.internalfb.com/intern/everpaste/?handle=GHmNRBnyyiCXTVkCAP28ClWARWwPbswMAAAz) TBE has a method to set learning rate, i.e., `set_learning_rate(lr)` where `lr` is the learning rate value to be set. D71010630 removes `optimizer_args.learning_rate` (float) and introduces `self.learning_rate_tensor` (tensor). Hence setting learning rate value means changing the value of the `leanring_rate_tensor`. We changed this by using `tensor._fill(lr)`. However, this seems to break bento kernel which is built from APF code, which causes issues when an in-place operation occurs i.e., `tensor._fill(lr)`. The workaround is to create a new tensor to avoid the in-place operation. The change passes the test https://www.internalfb.com/intern/testinfra/testconsole/testrun/3659174972188704/ Reviewed By: sryap, nautsimon Differential Revision: D72617537 fbshipit-source-id: d3b84e872e6b68d7c2a7da5b1380e3d355393c94

facebook-github-bot added the cla signed label Apr 8, 2025

facebook-github-bot added the fb-exported label Apr 8, 2025

facebook-github-bot closed this in 7bb8db2 Apr 8, 2025

facebook-github-bot added the Merged label Apr 8, 2025

q10 added category:improvement feature:tbe labels Apr 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change set_learning_rate_tensor #3945

Change set_learning_rate_tensor #3945

Uh oh!

spcyppt commented Apr 8, 2025

Uh oh!

facebook-github-bot commented Apr 8, 2025

Uh oh!

netlify bot commented Apr 8, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Apr 8, 2025

Uh oh!

Uh oh!

Change set_learning_rate_tensor #3945

Change set_learning_rate_tensor #3945

Uh oh!

Conversation

spcyppt commented Apr 8, 2025

Uh oh!

facebook-github-bot commented Apr 8, 2025

Uh oh!

netlify bot commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

facebook-github-bot commented Apr 8, 2025

Uh oh!

Uh oh!

netlify bot commented Apr 8, 2025 •

edited

Loading