Skip to content

Support zero-size inputs in FP8 cuda quantize kernel #3448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jiawenliu64
Copy link
Member

Summary:
For MOE, if tokens are not routed (in dynamic case), we could have some experts running 0 tokens, found by Jason

This Diff supports zero-size inputs in FP8 cuda quantize kernel in this case

Differential Revision: D66727399

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66727399

Copy link

netlify bot commented Dec 4, 2024

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 9981cae
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6750eb6a74ca850008a7869f
😎 Deploy Preview https://deploy-preview-3448--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Summary:
X-link: facebookresearch/FBGEMM#533

For MOE, if tokens are not routed (in dynamic case), we could have some experts running 0 tokens, found by jasonjk-park

This Diff supports zero-size inputs in FP8 cuda quantize kernel in this case

Reviewed By: jasonjk-park

Differential Revision: D66727399
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66727399

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 1a0d837.

q10 pushed a commit to q10/FBGEMM that referenced this pull request Apr 10, 2025
Summary:
X-link: pytorch#3448

Pull Request resolved: facebookresearch/FBGEMM#533

For MOE, if tokens are not routed (in dynamic case), we could have some experts running 0 tokens, found by jasonjk-park

This Diff supports zero-size inputs in FP8 cuda quantize kernel in this case

Reviewed By: jasonjk-park

Differential Revision: D66727399

fbshipit-source-id: e4d760edace6b9e0cc6a1018f88e03b8a19b0ce6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants