Skip to content

retune some of the EMU1.6 7B FP8 GEMM shapes #3328

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

mxz297
Copy link
Contributor

@mxz297 mxz297 commented Nov 5, 2024

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/422

Retuning with CKProfiler shows some small FP8 GEMM improvement for a few shapes. Tuning results may depend on cmopiler, firmware, rocm version and so on.

Seeing some slightly improvement (E2E 50ms improvement)

Differential Revision: D65486644

Summary:
X-link: facebookresearch/FBGEMM#422

Retuning with CKProfiler shows some small FP8 GEMM improvement for a few shapes. Tuning results may depend on cmopiler, firmware, rocm version and so on.

Seeing some slightly improvement (E2E 50ms improvement)

Differential Revision: D65486644
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65486644

Copy link

netlify bot commented Nov 5, 2024

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit a4d91e1
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/672a5f1b8990070008ff840b
😎 Deploy Preview https://deploy-preview-3328--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 89f5d93.

q10 pushed a commit to q10/FBGEMM that referenced this pull request Apr 10, 2025
Summary:
X-link: pytorch#3328

Pull Request resolved: facebookresearch/FBGEMM#422

Retuning with CKProfiler shows some small FP8 GEMM improvement for a few shapes. Tuning results may depend on cmopiler, firmware, rocm version and so on.

Seeing some slightly improvement (E2E 50ms improvement)

Reviewed By: jwfromm

Differential Revision: D65486644

fbshipit-source-id: ba4172dcda7929e4c31fa181b5d9a550de73da28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants