You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use PyTorch's p2p access enable function (pytorch#2000)
Summary:
We split the diff after adding a needed lazy cuda init call in enable p2p access function.
Diff 1: D48939723
[PyTorch] Add the lazy init call for p2p access function
*Prior context*
cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently.
expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions
to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory.
In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before.
Differential Revision: D49021817
0 commit comments