-
Notifications
You must be signed in to change notification settings - Fork 610
Use PyTorch's p2p access enable function #2000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for pytorch-fbgemm-docs canceled.
|
This pull request was exported from Phabricator. Differential Revision: D49021817 |
Summary: We split the diff after adding a needed lazy cuda init call in enable p2p access function. Diff 1: D48939723 [PyTorch] Add the lazy init call for p2p access function *Prior context* cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently. expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory. In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before. Differential Revision: D49021817
3892929
to
cc7cd60
Compare
This pull request was exported from Phabricator. Differential Revision: D49021817 |
Summary: We split the diff after adding a needed lazy cuda init call in enable p2p access function. Diff 1: D48939723 [PyTorch] Add the lazy init call for p2p access function *Prior context* cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently. expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory. In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before. Differential Revision: D49021817
cc7cd60
to
1059020
Compare
This pull request was exported from Phabricator. Differential Revision: D49021817 |
1059020
to
433a510
Compare
Summary: We split the diff after adding a needed lazy cuda init call in enable p2p access function. Diff 1: D48939723 [PyTorch] Add the lazy init call for p2p access function *Prior context* cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently. expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory. In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before. Reviewed By: zdevito Differential Revision: D49021817
This pull request was exported from Phabricator. Differential Revision: D49021817 |
Summary: We split the diff after adding a needed lazy cuda init call in enable p2p access function. Diff 1: D48939723 [PyTorch] Add the lazy init call for p2p access function *Prior context* cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently. expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory. In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before. Reviewed By: zdevito Differential Revision: D49021817
433a510
to
3bc6a09
Compare
This pull request was exported from Phabricator. Differential Revision: D49021817 |
Summary: We split the diff after adding a needed lazy cuda init call in enable p2p access function. Diff 1: D48939723 [PyTorch] Add the lazy init call for p2p access function *Prior context* cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently. expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory. In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before. Reviewed By: zdevito Differential Revision: D49021817
3bc6a09
to
aa6971c
Compare
This pull request was exported from Phabricator. Differential Revision: D49021817 |
Summary: We split the diff after adding a needed lazy cuda init call in enable p2p access function. Diff 1: D48939723 [PyTorch] Add the lazy init call for p2p access function *Prior context* cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently. expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory. In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before. Reviewed By: zdevito Differential Revision: D49021817
aa6971c
to
86bf244
Compare
This pull request was exported from Phabricator. Differential Revision: D49021817 |
Summary: We split the diff after adding a needed lazy cuda init call in enable p2p access function. Diff 1: D48939723 [PyTorch] Add the lazy init call for p2p access function *Prior context* cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently. expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory. In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before. Reviewed By: zdevito Differential Revision: D49021817
86bf244
to
71609f9
Compare
This pull request was exported from Phabricator. Differential Revision: D49021817 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D49021817 |
This pull request has been merged in 14cf6f2. |
Summary:
We split the diff after adding a needed lazy cuda init call in enable p2p access function.
Diff 1: D48939723
[PyTorch] Add the lazy init call for p2p access function
Prior context
cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs
get_p2p_access
which lets its allocator figure out how to correctly enable p2p access for that memory.In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.
Differential Revision: D49021817