Add a workaround for stochastic rounding for AMD GPUs #3908

sryap · 2025-04-01T06:25:46Z

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/997

This diff contains a workaround for the stochastic rounding issue for
the AMD GPUs.

Problem:

quantize_store calls nearest_rounding_vector instead of
stochastic_rounding_vector when stochastic rounding is used because
the StochasticRoundingRNGState pointer is a nullptr
(https://fburl.com/code/kna14icj)

We found that the WeightRow constructor also gets a null
StochasticRoundingRNGState pointer (https://fburl.com/code/vyq53lia)

When WeightRow is instantiated, we confirm that
stochastic_rounding is
true. WeightRow should receive &state, but instead it receives a
nullptr. (https://fburl.com/code/o3kxgt4z)

We suspect that the compiler might have optimized out the
StochasticRoundingRNGState since it is only passed to WeightRow
and not utilized anywhere else in the caller kernel.

Workaround:

We move the StochasticRoundingRNGState storage inside the
WeightRow struct and pass a boolean to the WeightRow constructor
instead.

Differential Revision: D72201618

Summary: X-link: facebookresearch/FBGEMM#997 This diff contains a workaround for the stochastic rounding issue for the AMD GPUs. Problem: `quantize_store` calls `nearest_rounding_vector` instead of `stochastic_rounding_vector` when stochastic rounding is used because the `StochasticRoundingRNGState` pointer is a nullptr (https://fburl.com/code/kna14icj) We found that the `WeightRow` constructor also gets a null `StochasticRoundingRNGState` pointer (https://fburl.com/code/vyq53lia) When `WeightRow` is instantiated, we confirm that `stochastic_rounding` is true. `WeightRow` should receive `&state`, but instead it receives a nullptr. (https://fburl.com/code/o3kxgt4z) We suspect that the compiler might have optimized out the `StochasticRoundingRNGState` since it is only passed to `WeightRow` and not utilized anywhere else in the caller kernel. Workaround: We move the `StochasticRoundingRNGState` storage inside the `WeightRow` struct and pass a boolean to the `WeightRow` constructor instead. Differential Revision: D72201618

facebook-github-bot · 2025-04-01T06:25:59Z

This pull request was exported from Phabricator. Differential Revision: D72201618

netlify · 2025-04-01T06:26:08Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`3de1055`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67eb86ed541a570008328137
😎 Deploy Preview	https://deploy-preview-3908--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot · 2025-04-01T20:46:41Z

This pull request has been merged in 8093e7d.

Summary: X-link: pytorch#3908 Pull Request resolved: facebookresearch/FBGEMM#997 This diff contains a workaround for the stochastic rounding issue for the AMD GPUs. Problem: `quantize_store` calls `nearest_rounding_vector` instead of `stochastic_rounding_vector` when stochastic rounding is used because the `StochasticRoundingRNGState` pointer is a nullptr (https://fburl.com/code/kna14icj) We found that the `WeightRow` constructor also gets a null `StochasticRoundingRNGState` pointer (https://fburl.com/code/vyq53lia) When `WeightRow` is instantiated, we confirm that `stochastic_rounding` is true. `WeightRow` should receive `&state`, but instead it receives a nullptr. (https://fburl.com/code/o3kxgt4z) We suspect that the compiler might have optimized out the `StochasticRoundingRNGState` since it is only passed to `WeightRow` and not utilized anywhere else in the caller kernel. Workaround: We move the `StochasticRoundingRNGState` storage inside the `WeightRow` struct and pass a boolean to the `WeightRow` constructor instead. Reviewed By: q10, yinbinm, jianyuh, xw285cornell, yoyoyocmu, joebos Differential Revision: D72201618 fbshipit-source-id: a2bc7f004ac5183c84eb0501ada6d848ebca17e1

facebook-github-bot added the cla signed label Apr 1, 2025

facebook-github-bot added the fb-exported label Apr 1, 2025

facebook-github-bot closed this in 8093e7d Apr 1, 2025

facebook-github-bot added the Merged label Apr 1, 2025

q10 added category:fix feature:tbe labels Apr 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a workaround for stochastic rounding for AMD GPUs #3908

Add a workaround for stochastic rounding for AMD GPUs #3908

Uh oh!

sryap commented Apr 1, 2025

Uh oh!

facebook-github-bot commented Apr 1, 2025

Uh oh!

netlify bot commented Apr 1, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Apr 1, 2025

Uh oh!

Uh oh!

Add a workaround for stochastic rounding for AMD GPUs #3908

Add a workaround for stochastic rounding for AMD GPUs #3908

Uh oh!

Conversation

sryap commented Apr 1, 2025

Uh oh!

facebook-github-bot commented Apr 1, 2025

Uh oh!

netlify bot commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

facebook-github-bot commented Apr 1, 2025

Uh oh!

Uh oh!

netlify bot commented Apr 1, 2025 •

edited

Loading