Cuda interop #517

devshgraphicsprogramming · 2023-07-13T15:54:09Z

Description

Testing

TODO list:

include/nbl/video/CCUDADevice.h

include/nbl/video/IDeviceMemoryBacked.h

devshgraphicsprogramming · 2023-07-16T00:38:15Z

include/nbl/video/IDeviceMemoryBacked.h

+        void chainPreDestroyCleanup(std::unique_ptr<ICleanup> first)
+        {
+            ICleanup::chain(m_cachedCreationParams.preDestroyCleanup, std::move(first));
+        }


the cleanups are not meant to be dynamic, all the cleanups you'll ever need are required to be specified in the SCreationParams

the fact you've added this method tells me one thing, you're attempting to use the system to make the Nabla object hold onto something after it has already been created, probably the CUDA resource that you've created from it. Esp because you're hooking into the pre-destroy cleanup.

Whereas the intended use is for Nabla object to hold onto some external resoruce it has been created from.

In the case where you have a Nabla object, owned and created by Nabla, being exported, which created an object in some other API that needs to be destroyed before the Nabla object...

The correct way is for the other API object wrapper to hold a smart_Refctd_ptr to the Nabla object it was imported from.

devshgraphicsprogramming · 2023-07-16T00:44:14Z

include/nbl/video/IDeviceMemoryBacked.h

+    std::unique_ptr<ICleanup> next;
+
+    static void chain(std::unique_ptr<ICleanup>& first, std::unique_ptr<ICleanup>&& next)
+    {
+        if (first)
+            return chain(first->next, std::move(next));
+        first = std::move(next);
+    }


don't like it being the default, if you want to chain up cleanups the proper way would be to either:

have a "metacleanup" that inherits from multiple cleanups (and each of them virtually from ICleanup)

have this same chain, but in a CLinkedListCleanup final

Anyway this is probably not needed for anything due to what I think is the misuse of the API https://github.com/Devsh-Graphics-Programming/Nabla/pull/517/files#r1264588641

devshgraphicsprogramming · 2023-07-16T00:46:51Z

include/nbl/video/ILogicalDevice.h

@@ -423,6 +423,8 @@ class NBL_API2 ILogicalDevice : public core::IReferenceCounted, public IDeviceMe
        //
        virtual void unmapMemory(IDeviceMemoryAllocation* memory) = 0;

+        virtual void* getExternalMemoryHandle(IDeviceMemoryBacked* obj) const = 0;


do we need it though?

We already have getExternalMemoryHandle on the IDeviceMemoryBacked itself

IDeviceMemoryBacked calls this one and caches the returned handle. This could be removed. Also, I think these handles need to be released by platform specific functions such as CloseHandle on win32 so we might be needing to track them.

src/nbl/video/CVulkanBuffer.h

src/nbl/video/CVulkanBuffer.cpp

devshgraphicsprogramming · 2023-07-16T00:53:44Z

src/nbl/video/CCUDAHandler.cpp

+	const char* nvrtc64_versions[] = { "nvrtc64_120", "nvrtc64_111","nvrtc64_110","nvrtc64_102","nvrtc64_101","nvrtc64_100","nvrtc64_92","nvrtc64_91","nvrtc64_90","nvrtc64_80","nvrtc64_75","nvrtc64_70",nullptr };
 	const char* nvrtc64_suffices[] = {"","_","_0","_1","_2",nullptr};


btw CUDA 11 had 8 minor versions, and 12 has 3 minor versions, so you're probably missing a few in the version table

devshgraphicsprogramming · 2023-07-16T00:57:13Z

src/nbl/video/CVulkanLogicalDevice.h

+        VkMemoryGetWin32HandleInfoKHR getHandleInfo = {
+            .sType = VK_STRUCTURE_TYPE_MEMORY_GET_WIN32_HANDLE_INFO_KHR,
+            .memory = vkHandle,
+            .handleType = VkExternalMemoryHandleTypeFlagBits(obj->getCachedCreationParams().externalHandleType.value),
+        };
+        void* handle = 0;
+        if(VK_SUCCESS == m_devf.vk.vkGetMemoryWin32HandleKHR(m_vkdev, &getHandleInfo, &handle))


please guard Windows specific code with the appropriate #ifdef NBL_....._PLATFORM_....

src/nbl/video/CVulkanLogicalDevice.cpp

devshgraphicsprogramming · 2023-07-16T01:06:36Z

src/nbl/video/CVulkanLogicalDevice.cpp

+            .semaphore = semaphore,
+            .handleType = static_cast<VkExternalSemaphoreHandleTypeFlagBits>(params.externalHandleType.value),
+        };
+        if (VK_SUCCESS != m_devf.vk.vkGetSemaphoreWin32HandleKHR(m_vkdev, &props, &params.externalHandle))


I'd make sure you follow the same pattern as you do with Buffers/Images, which is not to hotpatch the creation parameters (because we can no longer tell if the thing was created from an import or created for export)

Yeah, I'm aware this broke the pattern with the external resources. Although atm CUDA can only import semaphores so we don't need to deal with imports on nabla side.

with refcounted osHandles we will no longer need to know/care if object was created from import or to export

src/nbl/video/CVulkanLogicalDevice.cpp

devshgraphicsprogramming · 2023-07-16T01:19:22Z

src/nbl/video/CVulkanLogicalDevice.cpp

+        bufferMemoryReqs.prefersDedicatedAllocation = vk_dedicatedMemoryRequirements.prefersDedicatedAllocation;
+        bufferMemoryReqs.requiresDedicatedAllocation = vk_dedicatedMemoryRequirements.requiresDedicatedAllocation;


override these two with true if external or assert that they were true, unless your requirements are not true here
https://github.com/Devsh-Graphics-Programming/Nabla/pull/517/files#r1264590615

Just to make it clear, unlike textures it is not required for external buffers to have dedicated allocations. It's just easier for us to make them dedicated.

include/nbl/video/CVulkanImage.h

include/nbl/asset/IBuffer.h

include/nbl/video/CCUDASharedMemory.h

include/nbl/video/CCUDASharedSemaphore.h

include/nbl/video/CVulkanImage.h

include/nbl/video/IDeviceMemoryBacked.h

include/nbl/video/IGPUImage.h

include/nbl/video/ILogicalDevice.h

include/nbl/video/IPhysicalDevice.h

…useful as a utility function)

atkurtul added 7 commits July 8, 2023 16:39

extend creation structures to accommodate CUDA interop

a7b9fcb

create exportable buffers to import into cuda

07ba293

add missing cuda fn and update submodule

fabf528

add missing cuda export functions

d3ced33

move boilerplates to CCUDADevice

164dcb5

correct chained cleanup desctruction order

dad28eb

add safety checks

30e9b49

devshgraphicsprogramming commented Jul 15, 2023

View reviewed changes

include/nbl/video/CCUDADevice.h Outdated Show resolved Hide resolved

devshgraphicsprogramming commented Jul 15, 2023

View reviewed changes

include/nbl/video/CCUDADevice.h Show resolved Hide resolved

devshgraphicsprogramming commented Jul 15, 2023

View reviewed changes

include/nbl/video/CCUDADevice.h Outdated Show resolved Hide resolved

devshgraphicsprogramming commented Jul 15, 2023

View reviewed changes

include/nbl/video/CCUDADevice.h Outdated Show resolved Hide resolved

atkurtul added 2 commits July 15, 2023 23:20

semaphore interop

a164d4d

update examples

bfacd90