I have been working with CUDA9 / JavaCPP for a few days and got everything up and running very fast. Thank you!
However I cannot seems to get cuda.cuLaunchCooperativeKernelMultiDevice() working. It takes CUDA_LAUNCH_PARAMS as the first argument and second argument the array size, but what I need is an array of CUDA_LAUNCH_PARAMS. I tried via PointerPointer but that dit not fix things.
Does anyone have a solution on how to call cuLaunchCooperativeKernelMultiDevice for multiple devices?
CUDA_LAUNCH_PARAMS is a Pointer, which can point to a native array. To allocate an array of size 10, for example, we can call new CUDA_LAUNCH_PARAMS(10).
Thanks for the feedback. I got it to work!
@saudet , one more question. I have the array of CUDA_LAUNCH_PARAMS working. I can also set all grid and block variables and the kernels executes correctly on the gpus.
Next up is setting the kernel parameters. But each time when I set a kernel parameter like: launchParams.kernelParams(0, new LongPointer(new long[1]) I instantly get a SIGSEGV crash.
That doesn't look right. You're going to need to follow the doc that NVIDIA
provides about that...
NVIDIA doc says you have to make this struct:
typedef struct CUDA_LAUNCH_PARAMS_st {
CUfunction function; /**< Kernel to launch */
unsigned int gridDimX; /**< Width of grid in blocks */
unsigned int gridDimY; /**< Height of grid in blocks */
unsigned int gridDimZ; /**< Depth of grid in blocks */
unsigned int blockDimX; /**< X dimension of each thread block */
unsigned int blockDimY; /**< Y dimension of each thread block */
unsigned int blockDimZ; /**< Z dimension of each thread block */
unsigned int sharedMemBytes; /**< Dynamic shared-memory size per thread block in bytes */
CUstream hStream; /**< Stream identifier */
void **kernelParams; /**< Array of pointers to kernel parameters */
} CUDA_LAUNCH_PARAMS;`
So I want to set the "void **kernelParams;" pointer. However the cuda. java code only provides these options:
'public native Pointer kernelParams(int i);
public native CUDA_LAUNCH_PARAMS kernelParams(int i, Pointer kernelParams);
@MemberGetter public native @Cast("void**") PointerPointer kernelParams();'
So how should I proceed?
You'll need to allocate your own PointerPointer and pass that...
Like this? All variants give a SIGSEGV
launchParams.kernelParams(0, new PointerPointer(new IntPointer(new int[1])));
launchParams.kernelParams(0, new PointerPointer(new Pointer()));
launchParams.kernelParams(0, new PointerPointer(new Pointer[] { new Pointer() }));
launchParams.kernelParams(0, new PointerPointer(new Pointer[] { new IntPointer(new int[1]) }));
I would also think that kernelParams(0, pointer) would suggest a normal pointer and when returning the entire array with kernelParams() then I would get a PointerPointer back?
That is indeed an issue. We'll have to fix this.
In the meantime, we can work around that by using Loader.sizeof(CUDA_LAUNCH_PARAMS.class) and Loader.offsetof(CUDA_LAUNCH_PARAMS.class, "kernelParams") with new BytePointer(launchParams).putPointer(..., kernelParams).
Thanks for the feedback and yes it works!
The fix is included in version 1.4, providing wrappers for CUDA 9.1 now though:
http://search.maven.org/#search%7Cga%7C1%7Cbytedeco%20cuda
Thanks for reporting and testing this out!