implement packed quantize row / dequantize row API #3915

SamGinzburg · 2025-04-02T14:16:53Z

Summary:
API for a packed version of quantize/dequantize row.

This version will return a single, contiguous tensor in memory instead of returning two tensors, and operates on the contiguous tensor.

Example usage:

            a = torch.randn(shape, dtype=torch.bfloat16, device="cuda")

            packed_values = quantize_fp8_packed_row_raw(
                a,
                use_triton=True,
            )

            # Undo scaling.
            a_bf16 = dequantize_fp8_packed_row(packed_values)

            torch.testing.assert_close(a_bf16, a, atol=2e-1, rtol=1e-1)

A third API: "quantize_fp8_packed_row" mimics the API of quantize_fp8_row (mainly for testing).

Differential Revision: D72121939

facebook-github-bot · 2025-04-02T14:17:06Z

This pull request was exported from Phabricator. Differential Revision: D72121939

netlify · 2025-04-02T14:17:13Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`41cb838`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67ed6a7adc49870008cf9a14
😎 Deploy Preview	https://deploy-preview-3915--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Summary: X-link: facebookresearch/FBGEMM#1004 API for a packed version of quantize/dequantize row. This version will return a single, contiguous tensor in memory instead of returning two tensors, and operates on the contiguous tensor. Example usage: ``` a = torch.randn(shape, dtype=torch.bfloat16, device="cuda") packed_values = quantize_fp8_packed_row_raw( a, use_triton=True, ) # Undo scaling. a_bf16 = dequantize_fp8_packed_row(packed_values) torch.testing.assert_close(a_bf16, a, atol=2e-1, rtol=1e-1) ``` A third API: "quantize_fp8_packed_row" mimics the API of quantize_fp8_row (mainly for testing). Reviewed By: jiawenliu64 Differential Revision: D72121939

facebook-github-bot · 2025-04-02T16:23:33Z

This pull request was exported from Phabricator. Differential Revision: D72121939

Summary: X-link: facebookresearch/FBGEMM#1004 Pull Request resolved: pytorch#3915 API for a packed version of quantize/dequantize row. This version will return a single, contiguous tensor in memory instead of returning two tensors, and operates on the contiguous tensor. Example usage: ``` a = torch.randn(shape, dtype=torch.bfloat16, device="cuda") packed_values = quantize_fp8_packed_row_raw( a, use_triton=True, ) # Undo scaling. a_bf16 = dequantize_fp8_packed_row(packed_values) torch.testing.assert_close(a_bf16, a, atol=2e-1, rtol=1e-1) ``` A third API: "quantize_fp8_packed_row" mimics the API of quantize_fp8_row (mainly for testing). Reviewed By: jiawenliu64 Differential Revision: D72121939

facebook-github-bot · 2025-04-02T16:32:20Z

This pull request was exported from Phabricator. Differential Revision: D72121939

Summary: X-link: facebookresearch/FBGEMM#1004 Pull Request resolved: pytorch#3915 API for a packed version of quantize/dequantize row. This version will return a single, contiguous tensor in memory instead of returning two tensors, and operates on the contiguous tensor. Example usage: ``` a = torch.randn(shape, dtype=torch.bfloat16, device="cuda") packed_values = quantize_fp8_packed_row_raw( a, use_triton=True, ) # Undo scaling. a_bf16 = dequantize_fp8_packed_row(packed_values) torch.testing.assert_close(a_bf16, a, atol=2e-1, rtol=1e-1) ``` A third API: "quantize_fp8_packed_row" mimics the API of quantize_fp8_row (mainly for testing). Reviewed By: jiawenliu64 Differential Revision: D72121939

Summary: X-link: facebookresearch/FBGEMM#1004 API for a packed version of quantize/dequantize row. This version will return a single, contiguous tensor in memory instead of returning two tensors, and operates on the contiguous tensor. Example usage: ``` a = torch.randn(shape, dtype=torch.bfloat16, device="cuda") packed_values = quantize_fp8_packed_row_raw( a, use_triton=True, ) # Undo scaling. a_bf16 = dequantize_fp8_packed_row(packed_values) torch.testing.assert_close(a_bf16, a, atol=2e-1, rtol=1e-1) ``` A third API: "quantize_fp8_packed_row" mimics the API of quantize_fp8_row (mainly for testing). Reviewed By: jiawenliu64 Differential Revision: D72121939

facebook-github-bot · 2025-04-02T16:39:05Z

This pull request was exported from Phabricator. Differential Revision: D72121939

Summary: X-link: facebookresearch/FBGEMM#1004 Pull Request resolved: pytorch#3915 API for a packed version of quantize/dequantize row. This version will return a single, contiguous tensor in memory instead of returning two tensors, and operates on the contiguous tensor. Example usage: ``` a = torch.randn(shape, dtype=torch.bfloat16, device="cuda") packed_values = quantize_fp8_packed_row_raw( a, use_triton=True, ) # Undo scaling. a_bf16 = dequantize_fp8_packed_row(packed_values) torch.testing.assert_close(a_bf16, a, atol=2e-1, rtol=1e-1) ``` A third API: "quantize_fp8_packed_row" mimics the API of quantize_fp8_row (mainly for testing). Reviewed By: jiawenliu64 Differential Revision: D72121939

facebook-github-bot · 2025-04-02T16:48:49Z

This pull request was exported from Phabricator. Differential Revision: D72121939

facebook-github-bot · 2025-04-02T21:49:07Z

This pull request has been merged in def7bbe.

Summary: Pull Request resolved: facebookresearch/FBGEMM#1004 X-link: pytorch#3915 API for a packed version of quantize/dequantize row. This version will return a single, contiguous tensor in memory instead of returning two tensors, and operates on the contiguous tensor. Example usage: ``` a = torch.randn(shape, dtype=torch.bfloat16, device="cuda") packed_values = quantize_fp8_packed_row_raw( a, use_triton=True, ) # Undo scaling. a_bf16 = dequantize_fp8_packed_row(packed_values) torch.testing.assert_close(a_bf16, a, atol=2e-1, rtol=1e-1) ``` A third API: "quantize_fp8_packed_row" mimics the API of quantize_fp8_row (mainly for testing). Reviewed By: jiawenliu64 Differential Revision: D72121939 fbshipit-source-id: c88f99977e21e64f6f6f7dbb3c7430c2c2b56d0e

facebook-github-bot added the cla signed label Apr 2, 2025

facebook-github-bot added the fb-exported label Apr 2, 2025

SamGinzburg force-pushed the export-D72121939 branch from 19330ae to bafd0fd Compare April 2, 2025 16:15

SamGinzburg force-pushed the export-D72121939 branch from bafd0fd to a4146f3 Compare April 2, 2025 16:16

SamGinzburg force-pushed the export-D72121939 branch from a4146f3 to 99d101a Compare April 2, 2025 16:23

SamGinzburg force-pushed the export-D72121939 branch from 99d101a to b932338 Compare April 2, 2025 16:32

SamGinzburg force-pushed the export-D72121939 branch from b932338 to 051e6de Compare April 2, 2025 16:36

SamGinzburg force-pushed the export-D72121939 branch from 051e6de to 9f3672a Compare April 2, 2025 16:37

SamGinzburg force-pushed the export-D72121939 branch from 9f3672a to 75cccc8 Compare April 2, 2025 16:39

SamGinzburg force-pushed the export-D72121939 branch from 75cccc8 to 41cb838 Compare April 2, 2025 16:48

facebook-github-bot closed this in def7bbe Apr 2, 2025

facebook-github-bot added the Merged label Apr 2, 2025

q10 added category:new feature:quantize labels Apr 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

implement packed quantize row / dequantize row API #3915

implement packed quantize row / dequantize row API #3915

SamGinzburg commented Apr 2, 2025

Uh oh!

facebook-github-bot commented Apr 2, 2025

Uh oh!

netlify bot commented Apr 2, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Apr 2, 2025

Uh oh!

facebook-github-bot commented Apr 2, 2025

Uh oh!

facebook-github-bot commented Apr 2, 2025

Uh oh!

facebook-github-bot commented Apr 2, 2025

Uh oh!

facebook-github-bot commented Apr 2, 2025

Uh oh!

Uh oh!

implement packed quantize row / dequantize row API #3915

implement packed quantize row / dequantize row API #3915

Conversation

SamGinzburg commented Apr 2, 2025

Uh oh!

facebook-github-bot commented Apr 2, 2025

Uh oh!

netlify bot commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

facebook-github-bot commented Apr 2, 2025

Uh oh!

facebook-github-bot commented Apr 2, 2025

Uh oh!

facebook-github-bot commented Apr 2, 2025

Uh oh!

facebook-github-bot commented Apr 2, 2025

Uh oh!

facebook-github-bot commented Apr 2, 2025

Uh oh!

Uh oh!

netlify bot commented Apr 2, 2025 •

edited

Loading