Javacpp-presets: cuLaunchCooperativeKernelMultiDevice

Created on 8 Nov 2017  路  11Comments  路  Source: bytedeco/javacpp-presets

I have been working with CUDA9 / JavaCPP for a few days and got everything up and running very fast. Thank you!

However I cannot seems to get cuda.cuLaunchCooperativeKernelMultiDevice() working. It takes CUDA_LAUNCH_PARAMS as the first argument and second argument the array size, but what I need is an array of CUDA_LAUNCH_PARAMS. I tried via PointerPointer but that dit not fix things.

Does anyone have a solution on how to call cuLaunchCooperativeKernelMultiDevice for multiple devices?

bug question

All 11 comments

CUDA_LAUNCH_PARAMS is a Pointer, which can point to a native array. To allocate an array of size 10, for example, we can call new CUDA_LAUNCH_PARAMS(10).

Thanks for the feedback. I got it to work!

@saudet , one more question. I have the array of CUDA_LAUNCH_PARAMS working. I can also set all grid and block variables and the kernels executes correctly on the gpus.

Next up is setting the kernel parameters. But each time when I set a kernel parameter like: launchParams.kernelParams(0, new LongPointer(new long[1]) I instantly get a SIGSEGV crash.

That doesn't look right. You're going to need to follow the doc that NVIDIA
provides about that...

NVIDIA doc says you have to make this struct:

typedef struct CUDA_LAUNCH_PARAMS_st {
CUfunction function; /**< Kernel to launch */

unsigned int gridDimX;       /**< Width of grid in blocks */

unsigned int gridDimY;       /**< Height of grid in blocks */

unsigned int gridDimZ;       /**< Depth of grid in blocks */

unsigned int blockDimX;      /**< X dimension of each thread block */

unsigned int blockDimY;      /**< Y dimension of each thread block */

unsigned int blockDimZ;      /**< Z dimension of each thread block */

unsigned int sharedMemBytes; /**< Dynamic shared-memory size per thread block in bytes */

CUstream hStream;            /**< Stream identifier */

void **kernelParams;         /**< Array of pointers to kernel parameters */

} CUDA_LAUNCH_PARAMS;`

So I want to set the "void **kernelParams;" pointer. However the cuda. java code only provides these options:

'public native Pointer kernelParams(int i);
public native CUDA_LAUNCH_PARAMS kernelParams(int i, Pointer kernelParams);
@MemberGetter public native @Cast("void**") PointerPointer kernelParams();'

So how should I proceed?

You'll need to allocate your own PointerPointer and pass that...

Like this? All variants give a SIGSEGV
launchParams.kernelParams(0, new PointerPointer(new IntPointer(new int[1])));
launchParams.kernelParams(0, new PointerPointer(new Pointer()));
launchParams.kernelParams(0, new PointerPointer(new Pointer[] { new Pointer() }));
launchParams.kernelParams(0, new PointerPointer(new Pointer[] { new IntPointer(new int[1]) }));

I would also think that kernelParams(0, pointer) would suggest a normal pointer and when returning the entire array with kernelParams() then I would get a PointerPointer back?

That is indeed an issue. We'll have to fix this.

In the meantime, we can work around that by using Loader.sizeof(CUDA_LAUNCH_PARAMS.class) and Loader.offsetof(CUDA_LAUNCH_PARAMS.class, "kernelParams") with new BytePointer(launchParams).putPointer(..., kernelParams).

Thanks for the feedback and yes it works!

The fix is included in version 1.4, providing wrappers for CUDA 9.1 now though:
http://search.maven.org/#search%7Cga%7C1%7Cbytedeco%20cuda
Thanks for reporting and testing this out!

Was this page helpful?
0 / 5 - 0 ratings