Use PyTorch's p2p access enable function #2000

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

banitag1 wants to merge 1 commit into pytorch:main from banitag1:export-D49021817

Contributor

banitag1 commented Sep 6, 2023

Summary:
We split the diff after adding a needed lazy cuda init call in enable p2p access function.

Diff 1: D48939723
[PyTorch] Add the lazy init call for p2p access function

Prior context
cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs get_p2p_access which lets its allocator figure out how to correctly enable p2p access for that memory.

In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.

Differential Revision: D49021817

netlify bot commented Sep 6, 2023 •

edited

Loading

✅ Deploy Preview for pytorch-fbgemm-docs canceled.

Name	Link
🔨 Latest commit	`71609f9`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6501ed2009c8ee000790a0cd

facebook-github-bot added the cla signed label

Contributor

facebook-github-bot commented Sep 6, 2023

This pull request was exported from Phabricator. Differential Revision: D49021817

facebook-github-bot added the fb-exported label

banitag1 pushed a commit to banitag1/FBGEMM that referenced this pull request


          Use PyTorch's p2p access enable function (pytorch#2000)

cc7cd60

Summary:

We split the diff after adding a needed lazy cuda init call in enable p2p access function.

Diff 1: D48939723
[PyTorch] Add the lazy init call for p2p access function


*Prior context*
cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory.

In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.

Differential Revision: D49021817

banitag1 force-pushed the export-D49021817 branch from 3892929 to cc7cd60 Compare

September 7, 2023 17:02

Contributor

facebook-github-bot commented Sep 7, 2023

This pull request was exported from Phabricator. Differential Revision: D49021817

banitag1 pushed a commit to banitag1/FBGEMM that referenced this pull request


          Use PyTorch's p2p access enable function (pytorch#2000)

Summary:

We split the diff after adding a needed lazy cuda init call in enable p2p access function.

Diff 1: D48939723
[PyTorch] Add the lazy init call for p2p access function


*Prior context*
cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory.

In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.

Differential Revision: D49021817

banitag1 force-pushed the export-D49021817 branch from cc7cd60 to 1059020 Compare

September 7, 2023 17:02

Contributor

facebook-github-bot commented Sep 7, 2023

This pull request was exported from Phabricator. Differential Revision: D49021817

banitag1 force-pushed the export-D49021817 branch from 1059020 to 433a510 Compare

September 12, 2023 19:16

banitag1 pushed a commit to banitag1/FBGEMM that referenced this pull request


          Use PyTorch's p2p access enable function (pytorch#2000)

433a510

Summary:

We split the diff after adding a needed lazy cuda init call in enable p2p access function.

Diff 1: D48939723
[PyTorch] Add the lazy init call for p2p access function


*Prior context*
cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory.

In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.

Reviewed By: zdevito

Differential Revision: D49021817

Contributor

facebook-github-bot commented Sep 12, 2023

This pull request was exported from Phabricator. Differential Revision: D49021817

banitag1 pushed a commit to banitag1/FBGEMM that referenced this pull request


          Use PyTorch's p2p access enable function (pytorch#2000)

3bc6a09

Summary:

We split the diff after adding a needed lazy cuda init call in enable p2p access function.

Diff 1: D48939723
[PyTorch] Add the lazy init call for p2p access function


*Prior context*
cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory.

In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.

Reviewed By: zdevito

Differential Revision: D49021817

banitag1 force-pushed the export-D49021817 branch from 433a510 to 3bc6a09 Compare

September 12, 2023 19:16

Contributor

facebook-github-bot commented Sep 12, 2023

This pull request was exported from Phabricator. Differential Revision: D49021817

banitag1 pushed a commit to banitag1/FBGEMM that referenced this pull request


          Use PyTorch's p2p access enable function (pytorch#2000)

aa6971c

Summary:

We split the diff after adding a needed lazy cuda init call in enable p2p access function.

Diff 1: D48939723
[PyTorch] Add the lazy init call for p2p access function


*Prior context*
cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory.

In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.

Reviewed By: zdevito

Differential Revision: D49021817

banitag1 force-pushed the export-D49021817 branch from 3bc6a09 to aa6971c Compare

September 12, 2023 22:42

Contributor

facebook-github-bot commented Sep 12, 2023

This pull request was exported from Phabricator. Differential Revision: D49021817

banitag1 pushed a commit to banitag1/FBGEMM that referenced this pull request


          Use PyTorch's p2p access enable function (pytorch#2000)

86bf244

Summary:

We split the diff after adding a needed lazy cuda init call in enable p2p access function.

Diff 1: D48939723
[PyTorch] Add the lazy init call for p2p access function


*Prior context*
cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory.

In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.

Reviewed By: zdevito

Differential Revision: D49021817

banitag1 force-pushed the export-D49021817 branch from aa6971c to 86bf244 Compare

September 12, 2023 22:43

Contributor

facebook-github-bot commented Sep 12, 2023

This pull request was exported from Phabricator. Differential Revision: D49021817


          Use PyTorch's p2p access enable function (pytorch#2000)

71609f9

Summary:

We split the diff after adding a needed lazy cuda init call in enable p2p access function.

Diff 1: D48939723
[PyTorch] Add the lazy init call for p2p access function


*Prior context*
cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory.

In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.

Reviewed By: zdevito

Differential Revision: D49021817

banitag1 force-pushed the export-D49021817 branch from 86bf244 to 71609f9 Compare

September 13, 2023 17:10

Contributor

facebook-github-bot commented Sep 13, 2023

This pull request was exported from Phabricator. Differential Revision: D49021817

1 similar comment

Contributor

facebook-github-bot commented Sep 13, 2023

This pull request was exported from Phabricator. Differential Revision: D49021817

facebook-github-bot closed this in

14cf6f2

Contributor

facebook-github-bot commented Sep 14, 2023

This pull request has been merged in 14cf6f2.

facebook-github-bot added the Merged label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed fb-exported Merged