Skip to content

Add a workaround for stochastic rounding for AMD GPUs #3908

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sryap
Copy link
Contributor

@sryap sryap commented Apr 1, 2025

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/997

This diff contains a workaround for the stochastic rounding issue for
the AMD GPUs.

Problem:

quantize_store calls nearest_rounding_vector instead of
stochastic_rounding_vector when stochastic rounding is used because
the StochasticRoundingRNGState pointer is a nullptr
(https://fburl.com/code/kna14icj)

We found that the WeightRow constructor also gets a null
StochasticRoundingRNGState pointer (https://fburl.com/code/vyq53lia)

When WeightRow is instantiated, we confirm that
stochastic_rounding is
true. WeightRow should receive &state, but instead it receives a
nullptr. (https://fburl.com/code/o3kxgt4z)

We suspect that the compiler might have optimized out the
StochasticRoundingRNGState since it is only passed to WeightRow
and not utilized anywhere else in the caller kernel.

Workaround:

We move the StochasticRoundingRNGState storage inside the
WeightRow struct and pass a boolean to the WeightRow constructor
instead.

Differential Revision: D72201618

Summary:
X-link: facebookresearch/FBGEMM#997

This diff contains a workaround for the stochastic rounding issue for
the AMD GPUs.

Problem:

`quantize_store` calls `nearest_rounding_vector` instead of
`stochastic_rounding_vector` when stochastic rounding is used because
the `StochasticRoundingRNGState` pointer is a nullptr
(https://fburl.com/code/kna14icj)

We found that the `WeightRow` constructor also gets a null
`StochasticRoundingRNGState` pointer (https://fburl.com/code/vyq53lia)

When `WeightRow` is instantiated, we confirm that
`stochastic_rounding` is
true.  `WeightRow` should receive `&state`, but instead it receives a
nullptr. (https://fburl.com/code/o3kxgt4z)

We suspect that the compiler might have optimized out the
`StochasticRoundingRNGState` since it is only passed to `WeightRow`
and not utilized anywhere else in the caller kernel.

Workaround:

We move the `StochasticRoundingRNGState` storage inside the
`WeightRow` struct and pass a boolean to the `WeightRow` constructor
instead.

Differential Revision: D72201618
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72201618

Copy link

netlify bot commented Apr 1, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 3de1055
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67eb86ed541a570008328137
😎 Deploy Preview https://deploy-preview-3908--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 8093e7d.

q10 pushed a commit to q10/FBGEMM that referenced this pull request Apr 10, 2025
Summary:
X-link: pytorch#3908

Pull Request resolved: facebookresearch/FBGEMM#997

This diff contains a workaround for the stochastic rounding issue for
the AMD GPUs.

Problem:

`quantize_store` calls `nearest_rounding_vector` instead of
`stochastic_rounding_vector` when stochastic rounding is used because
the `StochasticRoundingRNGState` pointer is a nullptr
(https://fburl.com/code/kna14icj)

We found that the `WeightRow` constructor also gets a null
`StochasticRoundingRNGState` pointer (https://fburl.com/code/vyq53lia)

When `WeightRow` is instantiated, we confirm that
`stochastic_rounding` is
true.  `WeightRow` should receive `&state`, but instead it receives a
nullptr. (https://fburl.com/code/o3kxgt4z)

We suspect that the compiler might have optimized out the
`StochasticRoundingRNGState` since it is only passed to `WeightRow`
and not utilized anywhere else in the caller kernel.

Workaround:

We move the `StochasticRoundingRNGState` storage inside the
`WeightRow` struct and pass a boolean to the `WeightRow` constructor
instead.

Reviewed By: q10, yinbinm, jianyuh, xw285cornell, yoyoyocmu, joebos

Differential Revision: D72201618

fbshipit-source-id: a2bc7f004ac5183c84eb0501ada6d848ebca17e1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants