Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda interop vk13 #637

Open
wants to merge 64 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 61 commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
d293b9b
create exportable buffers to import into cuda
atkurtul Jul 8, 2023
f5f1017
add missing cuda fn and update submodule
atkurtul Jul 9, 2023
6689b33
add missing cuda export functions
atkurtul Jul 9, 2023
9ade1c6
move boilerplates to CCUDADevice
atkurtul Jul 9, 2023
bfa7afc
correct chained cleanup desctruction order
atkurtul Jul 15, 2023
ddb861e
add safety checks
atkurtul Jul 15, 2023
f380398
semaphore interop
atkurtul Jul 15, 2023
2f7b517
get cuda interop working in vulkan_1_3 branch
atkurtul Jul 15, 2023
bd32f36
point jitify to the right hash
atkurtul Jan 4, 2024
b1c5a46
update examples && use non KHR version of vk functions
atkurtul Jan 4, 2024
0d36581
correct bad validations, KHR instead of coe func usage etc.
devshgraphicsprogramming Jan 4, 2024
725a984
revert a dangerous api change
devshgraphicsprogramming Jan 4, 2024
d2c9382
update examples_tests
devshgraphicsprogramming Jan 4, 2024
2d24604
Disabled CSPIRVIntrospector
Przemog1 Jan 5, 2024
2114e50
small fixes
Przemog1 Jan 5, 2024
f6320ce
remove unused cruft
devshgraphicsprogramming Jan 6, 2024
f749ab8
draft
devshgraphicsprogramming Jan 7, 2024
ad1e6ff
move the TimelineEventHandlers to their own header, simplifying every…
devshgraphicsprogramming Jan 8, 2024
a1afcc8
Made the TimelineEventHandlerST use a const ISemaphore, almost all of…
devshgraphicsprogramming Jan 8, 2024
262281f
implement MultiTimelineEventHandlerST and correct TimelineEventHandlerST
devshgraphicsprogramming Jan 8, 2024
d7690be
fix KHR function loading bugs
devshgraphicsprogramming Jan 8, 2024
13ff02a
fix some nasty bug in TimelineEventHandlerST
devshgraphicsprogramming Jan 8, 2024
fabc999
Take the TimelineEventHandlerST for a first spin with ICommandPoolCache
devshgraphicsprogramming Jan 8, 2024
0eb8e9a
turns out its quite easy to port the other utilities to the new Multi…
devshgraphicsprogramming Jan 8, 2024
e59408d
remove more unused stuff
devshgraphicsprogramming Jan 8, 2024
3f41a81
fix one liner huge bug
devshgraphicsprogramming Jan 8, 2024
fb1f50d
fix a smal bug and introduce a base class for TimelineEventHandler, a…
devshgraphicsprogramming Jan 9, 2024
94ee680
fix one more KHR function pointer bug and remove unused class
devshgraphicsprogramming Jan 9, 2024
c761d42
bring back bits of IUtilities needed for ex 05
devshgraphicsprogramming Jan 9, 2024
04689b9
device cap traits
atkurtul Dec 5, 2023
4a17eaf
port macros to boost pp
atkurtul Dec 5, 2023
5fcad02
has_member_x_with_type
atkurtul Dec 5, 2023
3c97ef1
make e_member_presence bitflags
atkurtul Dec 5, 2023
06b43af
Use new inline SPIR-V builtin syntax from DXC
devshgraphicsprogramming Jan 10, 2024
fd73e28
const correctness on surface capabilities
devshgraphicsprogramming Jan 12, 2024
153dd21
3D Blit test case was failing because of unimplemented functions for …
devshgraphicsprogramming Jan 12, 2024
bc7e24d
Make the SPhysicalDeviceFilter use spans for requirement arrays.
devshgraphicsprogramming Jan 12, 2024
b234d3b
ok so I found out that renderdoc hates External memory
devshgraphicsprogramming Jan 12, 2024
b5a633a
fix typos causing issues
devshgraphicsprogramming Jan 12, 2024
2ab33ed
API draft
devshgraphicsprogramming Jan 12, 2024
bbc5aa9
think about the other 3 utility functions
devshgraphicsprogramming Jan 12, 2024
d41f279
design clearing up
devshgraphicsprogramming Jan 12, 2024
04d05da
Ok we're done here with the Streaming Buffer upload port (removed the…
devshgraphicsprogramming Jan 12, 2024
3d034c5
move the SIntendedSubmitInfo struct out of IUtilities
devshgraphicsprogramming Jan 12, 2024
3160a46
going to sleep, next TODO is to implement the IUtilities::downloadBuf…
devshgraphicsprogramming Jan 12, 2024
8670d42
outline the TODO for @theoreticalphysicsftw
devshgraphicsprogramming Jan 13, 2024
2d86373
fix debugmessenger not being created
atkurtul Jan 13, 2024
ca2593c
fix a validation error
devshgraphicsprogramming Jan 13, 2024
461cb4a
rework pipeline barriers and events to use std::spans
devshgraphicsprogramming Jan 13, 2024
d96fd1d
Port `downloadBufferRangeViaStagingBuffer
devshgraphicsprogramming Jan 13, 2024
2d2acc9
fix bug in CRAIISpanPatch
devshgraphicsprogramming Jan 13, 2024
60c1c39
Ported Example 23, and fixed a few bugs here and there
devshgraphicsprogramming Jan 14, 2024
3faf1fb
merge conflicts
atkurtul Jan 13, 2024
fd4f733
add missing external resource property queries
atkurtul Jan 14, 2024
5b1940c
add more stuff
atkurtul Jan 14, 2024
7074256
Merge branch 'vulkan_1_3' into cuda-interop-vk13
atkurtul Jan 14, 2024
6449b2f
Merge branch 'vulkan_1_3' into cuda-interop-vk13
atkurtul Jan 18, 2024
3d9a530
address pr comments
atkurtul Jan 18, 2024
4d174e5
last commit part 2
atkurtul Jan 18, 2024
cbd18f4
add missing cuda fn & map queue indices to vk
atkurtul Jan 18, 2024
23fe8d4
update submodule
atkurtul Jan 18, 2024
c32fd79
cache cuda devices
atkurtul Jan 18, 2024
4e2185c
ifdef platform code
atkurtul Jan 19, 2024
bd0b76a
log queue validation warning
atkurtul Jan 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion 3rdparty/jitify
Submodule jitify updated 5 files
+10 −5 Makefile
+137 −65 jitify.hpp
+72 −0 jitify_test.cu
+586 −0 nvrtc_cli.cpp
+58 −0 nvrtc_cli_test.sh
2 changes: 2 additions & 0 deletions include/nbl/asset/IBuffer.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ class IBuffer : public core::IBuffer, public IDescriptor
//! synthetic Nabla inventions
// whether `IGPUCommandBuffer::updateBuffer` can be used on this buffer
EUF_INLINE_UPDATE_VIA_CMDBUF = 0x80000000u,

EUF_SYNTHEHIC_FLAGS_MASK = EUF_INLINE_UPDATE_VIA_CMDBUF | 0 /* fill out as needed if anymore synthethic flags are added*/
};

//!
Expand Down
144 changes: 36 additions & 108 deletions include/nbl/video/CCUDADevice.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@


#include "nbl/video/IPhysicalDevice.h"

#include "nbl/video/CCUDASharedMemory.h"
#include "nbl/video/CCUDASharedSemaphore.h"

#ifdef _NBL_COMPILE_WITH_CUDA_

Expand All @@ -23,10 +24,27 @@
namespace nbl::video
{
class CCUDAHandler;
class CCUDASharedMemory;
class CCUDASharedSemaphore;

class CCUDADevice : public core::IReferenceCounted
{
public:
#ifdef _WIN32
static constexpr IDeviceMemoryAllocation::E_EXTERNAL_HANDLE_TYPE EXTERNAL_MEMORY_HANDLE_TYPE = IDeviceMemoryAllocation::EHT_OPAQUE_WIN32;
static constexpr CUmemAllocationHandleType ALLOCATION_HANDLE_TYPE = CU_MEM_HANDLE_TYPE_WIN32;
#else
static constexpr IDeviceMemoryBacked::E_EXTERNAL_HANDLE_TYPE EXTERNAL_MEMORY_HANDLE_TYPE = IDeviceMemoryBacked::EHT_OPAQUE_FD;
static constexpr CUmemAllocationHandleType ALLOCATION_TYPE = CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR;
#endif
struct SCUDACleaner : video::ICleanup
{
core::smart_refctd_ptr<const core::IReferenceCounted> resource;
SCUDACleaner(core::smart_refctd_ptr<const core::IReferenceCounted> resource)
: resource(std::move(resource))
{ }
};

enum E_VIRTUAL_ARCHITECTURE
{
EVA_30,
Expand Down Expand Up @@ -72,127 +90,37 @@ class CCUDADevice : public core::IReferenceCounted
// https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#vulkan-interoperability
// Watch out, use Driver API (`cu` functions) NOT the Runtime API (`cuda` functions)
// Also maybe separate this out into its own `CCUDA` class instead of nesting it here?
#if 0
template<typename ObjType>
struct GraphicsAPIObjLink
{
GraphicsAPIObjLink() : obj(nullptr), cudaHandle(nullptr), acquired(false)
{
asImage = {nullptr};
}
GraphicsAPIObjLink(core::smart_refctd_ptr<ObjType>&& _obj) : GraphicsAPIObjLink()
{
obj = std::move(_obj);
}
GraphicsAPIObjLink(GraphicsAPIObjLink&& other) : GraphicsAPIObjLink()
{
operator=(std::move(other));
}

GraphicsAPIObjLink(const GraphicsAPIObjLink& other) = delete;
GraphicsAPIObjLink& operator=(const GraphicsAPIObjLink& other) = delete;
GraphicsAPIObjLink& operator=(GraphicsAPIObjLink&& other)
{
std::swap(obj,other.obj);
std::swap(cudaHandle,other.cudaHandle);
std::swap(acquired,other.acquired);
std::swap(asImage,other.asImage);
return *this;
}

~GraphicsAPIObjLink()
{
assert(!acquired); // you've fucked up, there's no way for us to fix it, you need to release the objects on a proper stream
if (obj)
CCUDAHandler::cuda.pcuGraphicsUnregisterResource(cudaHandle);
}

//
auto* getObject() const {return obj.get();}

private:
core::smart_refctd_ptr<ObjType> obj;
CUgraphicsResource cudaHandle;
bool acquired;

friend class CCUDAHandler;
public:
union
{
struct
{
CUdeviceptr pointer;
} asBuffer;
struct
{
CUmipmappedArray mipmappedArray;
CUarray array;
} asImage;
};
};

//
static CUresult registerBuffer(GraphicsAPIObjLink<video::IGPUBuffer>* link, uint32_t flags = CU_GRAPHICS_REGISTER_FLAGS_NONE);
static CUresult registerImage(GraphicsAPIObjLink<video::IGPUImage>* link, uint32_t flags = CU_GRAPHICS_REGISTER_FLAGS_NONE);
CUdevice getInternalObject() const { return m_handle; }
const CCUDAHandler* getHandler() const { return m_handler.get(); }
CUresult importGPUSemaphore(core::smart_refctd_ptr<CCUDASharedSemaphore>* outPtr, ISemaphore* sem);
CUresult createSharedMemory(core::smart_refctd_ptr<CCUDASharedMemory>* outMem, struct CCUDASharedMemory::SCreationParams&& inParams);
bool isMatchingDevice(const IPhysicalDevice* device) { return device && !memcmp(device->getProperties().deviceUUID, m_vulkanDevice->getProperties().deviceUUID, 16); }

size_t roundToGranularity(CUmemLocationType location, size_t size) const;

template<typename ObjType>
static CUresult acquireResourcesFromGraphics(void* tmpStorage, GraphicsAPIObjLink<ObjType>* linksBegin, GraphicsAPIObjLink<ObjType>* linksEnd, CUstream stream)
{
auto count = std::distance(linksBegin,linksEnd);

auto resources = reinterpret_cast<CUgraphicsResource*>(tmpStorage);
auto rit = resources;
for (auto iit=linksBegin; iit!=linksEnd; iit++,rit++)
{
if (iit->acquired)
return CUDA_ERROR_UNKNOWN;
*rit = iit->cudaHandle;
}

auto retval = cuda.pcuGraphicsMapResources(count,resources,stream);
for (auto iit=linksBegin; iit!=linksEnd; iit++)
iit->acquired = true;
return retval;
}
template<typename ObjType>
static CUresult releaseResourcesToGraphics(void* tmpStorage, GraphicsAPIObjLink<ObjType>* linksBegin, GraphicsAPIObjLink<ObjType>* linksEnd, CUstream stream)
{
auto count = std::distance(linksBegin,linksEnd);

auto resources = reinterpret_cast<CUgraphicsResource*>(tmpStorage);
auto rit = resources;
for (auto iit=linksBegin; iit!=linksEnd; iit++,rit++)
{
if (!iit->acquired)
return CUDA_ERROR_UNKNOWN;
*rit = iit->cudaHandle;
}

auto retval = cuda.pcuGraphicsUnmapResources(count,resources,stream);
for (auto iit=linksBegin; iit!=linksEnd; iit++)
iit->acquired = false;
return retval;
}
protected:
CUresult reserveAdrressAndMapMemory(CUdeviceptr* outPtr, size_t size, size_t alignment, CUmemLocationType location, CUmemGenericAllocationHandle memory);

static CUresult acquireAndGetPointers(GraphicsAPIObjLink<video::IGPUBuffer>* linksBegin, GraphicsAPIObjLink<video::IGPUBuffer>* linksEnd, CUstream stream, size_t* outbufferSizes = nullptr);
static CUresult acquireAndGetMipmappedArray(GraphicsAPIObjLink<video::IGPUImage>* linksBegin, GraphicsAPIObjLink<video::IGPUImage>* linksEnd, CUstream stream);
static CUresult acquireAndGetArray(GraphicsAPIObjLink<video::IGPUImage>* linksBegin, GraphicsAPIObjLink<video::IGPUImage>* linksEnd, uint32_t* arrayIndices, uint32_t* mipLevels, CUstream stream);
#endif

protected:
// CUDAHandler creates CUDADevice, it needs to access ctor
friend class CCUDAHandler;
CCUDADevice(core::smart_refctd_ptr<CVulkanConnection>&& _vulkanConnection, IPhysicalDevice* const _vulkanDevice, const E_VIRTUAL_ARCHITECTURE _virtualArchitecture);
~CCUDADevice() = default;

CCUDADevice(core::smart_refctd_ptr<CVulkanConnection>&& _vulkanConnection, IPhysicalDevice* const _vulkanDevice, const E_VIRTUAL_ARCHITECTURE _virtualArchitecture, CUdevice _handle, core::smart_refctd_ptr<CCUDAHandler>&& _handler);
~CCUDADevice();

std::vector<const char*> m_defaultCompileOptions;
core::smart_refctd_ptr<CVulkanConnection> m_vulkanConnection;
IPhysicalDevice* const m_vulkanDevice;
E_VIRTUAL_ARCHITECTURE m_virtualArchitecture;
core::smart_refctd_ptr<CCUDAHandler> m_handler;
CUdevice m_handle;
CUcontext m_context;
size_t m_allocationGranularity[4];
};

}

#endif // _NBL_COMPILE_WITH_CUDA_

#endif
#endif
32 changes: 25 additions & 7 deletions include/nbl/video/CCUDAHandler.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ class CCUDAHandler : public core::IReferenceCounted
static T* cast_CUDA_ptr(CUdeviceptr ptr) { return reinterpret_cast<T*>(ptr); }

//
core::smart_refctd_ptr<CCUDAHandler> create(system::ISystem* system, core::smart_refctd_ptr<system::ILogger>&& _logger);
static core::smart_refctd_ptr<CCUDAHandler> create(system::ISystem* system, core::smart_refctd_ptr<system::ILogger>&& _logger);

//
using LibLoader = system::DefaultFuncPtrLoader;
Expand Down Expand Up @@ -119,6 +119,24 @@ class CCUDAHandler : public core::IReferenceCounted
,cuSurfObjectDestroy
,cuTexObjectCreate
,cuTexObjectDestroy
,cuImportExternalMemory
,cuDestroyExternalMemory
,cuExternalMemoryGetMappedBuffer
,cuMemUnmap
,cuMemAddressFree
,cuMemGetAllocationGranularity
,cuMemAddressReserve
,cuMemCreate
,cuMemExportToShareableHandle
,cuMemMap
,cuMemRelease
,cuMemSetAccess
,cuMemImportFromShareableHandle
,cuLaunchHostFunc
,cuDestroyExternalSemaphore
,cuImportExternalSemaphore
,cuSignalExternalSemaphoresAsync
,cuWaitExternalSemaphoresAsync
);
const CUDA& getCUDAFunctionTable() const {return m_cuda;}

Expand Down Expand Up @@ -157,9 +175,9 @@ class CCUDAHandler : public core::IReferenceCounted
const auto filesize = file->getSize();
std::string source(filesize+1u,'0');

system::future<size_t> bytesRead;
system::IFile::success_t bytesRead;
file->read(bytesRead,source.data(),0u,file->getSize());
source.resize(bytesRead.get());
source.resize(bytesRead.getBytesProcessed());

return createProgram(prog,std::move(source),file->getFileName().string().c_str(),headerCount,headerContents,includeNames);
}
Expand Down Expand Up @@ -226,8 +244,7 @@ class CCUDAHandler : public core::IReferenceCounted
}

core::smart_refctd_ptr<CCUDADevice> createDevice(core::smart_refctd_ptr<CVulkanConnection>&& vulkanConnection, IPhysicalDevice* physicalDevice);

protected:
protected:
CCUDAHandler(CUDA&& _cuda, NVRTC&& _nvrtc, core::vector<core::smart_refctd_ptr<system::IFile>>&& _headers, core::smart_refctd_ptr<system::ILogger>&& _logger, int _version)
: m_cuda(std::move(_cuda)), m_nvrtc(std::move(_nvrtc)), m_headers(std::move(_headers)), m_logger(std::move(_logger)), m_version(_version)
{
Expand All @@ -239,7 +256,8 @@ class CCUDAHandler : public core::IReferenceCounted
}
}
~CCUDAHandler() = default;



//
inline ptx_and_nvrtcResult_t compileDirectlyToPTX_impl(nvrtcResult result, nvrtcProgram program, core::SRange<const char* const> nvrtcOptions, std::string* log)
{
Expand Down Expand Up @@ -272,4 +290,4 @@ class CCUDAHandler : public core::IReferenceCounted

#endif // _NBL_COMPILE_WITH_CUDA_

#endif
#endif
71 changes: 71 additions & 0 deletions include/nbl/video/CCUDASharedMemory.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
// Copyright (C) 2018-2020 - DevSH Graphics Programming Sp. z O.O.
// This file is part of the "Nabla Engine".
// For conditions of distribution and use, see copyright notice in nabla.h
#ifndef _NBL_VIDEO_C_CUDA_SHARED_MEMORY_H_
#define _NBL_VIDEO_C_CUDA_SHARED_MEMORY_H_


#ifdef _NBL_COMPILE_WITH_CUDA_

#include "cuda.h"
#include "nvrtc.h"
#if CUDA_VERSION < 9000
#error "Need CUDA 9.0 SDK or higher."
#endif

// useful includes in the future
//#include "cudaEGL.h"
//#include "cudaVDPAU.h"

namespace nbl::video
{

class CCUDASharedMemory : public core::IReferenceCounted
{
public:
// required for us to see the move ctor
friend class CCUDADevice;
atkurtul marked this conversation as resolved.
Show resolved Hide resolved

CUdeviceptr getDeviceptr() const { return m_params.ptr; }

struct SCreationParams
{
size_t size;
uint32_t alignment;
CUmemLocationType location;
};

struct SCachedCreationParams : SCreationParams
{
size_t granularSize;
CUdeviceptr ptr;
union
{
void* osHandle;
int fd;
};
};

const SCreationParams& getCreationParams() const { return m_params; }

core::smart_refctd_ptr<IDeviceMemoryAllocation> exportAsMemory(ILogicalDevice* device, IDeviceMemoryBacked* dedication = nullptr) const;

core::smart_refctd_ptr<IGPUImage> createAndBindImage(ILogicalDevice* device, asset::IImage::SCreationParams&& params) const;

protected:

CCUDASharedMemory(core::smart_refctd_ptr<CCUDADevice>&& device, SCachedCreationParams&& params)
: m_device(std::move(device))
, m_params(std::move(params))
{}
~CCUDASharedMemory() override;

core::smart_refctd_ptr<CCUDADevice> m_device;
SCachedCreationParams m_params;
};

}

#endif // _NBL_COMPILE_WITH_CUDA_

#endif
49 changes: 49 additions & 0 deletions include/nbl/video/CCUDASharedSemaphore.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
// Copyright (C) 2018-2020 - DevSH Graphics Programming Sp. z O.O.
// This file is part of the "Nabla Engine".
// For conditions of distribution and use, see copyright notice in nabla.h
#ifndef _NBL_VIDEO_C_CUDA_SHARED_SEMAPHORE_H_
#define _NBL_VIDEO_C_CUDA_SHARED_SEMAPHORE_H_

#ifdef _NBL_COMPILE_WITH_CUDA_

#include "cuda.h"
#include "nvrtc.h"
#if CUDA_VERSION < 9000
#error "Need CUDA 9.0 SDK or higher."
#endif

// useful includes in the future
//#include "cudaEGL.h"
//#include "cudaVDPAU.h"

namespace nbl::video
{

class CCUDASharedSemaphore : public core::IReferenceCounted
{
public:
friend class CCUDADevice;

CUexternalSemaphore getInternalObject() const { return m_handle; }

protected:

CCUDASharedSemaphore(core::smart_refctd_ptr<CCUDADevice> device, core::smart_refctd_ptr<ISemaphore> src, CUexternalSemaphore semaphore, void* osHandle)
: m_device(std::move(device))
, m_src(std::move(m_src))
, m_handle(semaphore)
, m_osHandle(osHandle)
{}
~CCUDASharedSemaphore() override;

core::smart_refctd_ptr<CCUDADevice> m_device;
core::smart_refctd_ptr<ISemaphore> m_src;
CUexternalSemaphore m_handle;
void* m_osHandle;
};

}

#endif // _NBL_COMPILE_WITH_CUDA_

#endif
Loading