Javacpp-presets: cuLaunchCooperativeKernelMultiDevice

Created on 8 Nov 2017 · 11Comments · Source: bytedeco/javacpp-presets

I have been working with CUDA9 / JavaCPP for a few days and got everything up and running very fast. Thank you!

However I cannot seems to get cuda.cuLaunchCooperativeKernelMultiDevice() working. It takes CUDA_LAUNCH_PARAMS as the first argument and second argument the array size, but what I need is an array of CUDA_LAUNCH_PARAMS. I tried via PointerPointer but that dit not fix things.

Does anyone have a solution on how to call cuLaunchCooperativeKernelMultiDevice for multiple devices?

bug question

Source

maximusgrey

All 11 comments

CUDA_LAUNCH_PARAMS is a Pointer, which can point to a native array. To allocate an array of size 10, for example, we can call new CUDA_LAUNCH_PARAMS(10).

saudet on 9 Nov 2017

Thanks for the feedback. I got it to work!

maximusgrey on 9 Nov 2017

👍1

@saudet , one more question. I have the array of CUDA_LAUNCH_PARAMS working. I can also set all grid and block variables and the kernels executes correctly on the gpus.

Next up is setting the kernel parameters. But each time when I set a kernel parameter like: launchParams.kernelParams(0, new LongPointer(new long[1]) I instantly get a SIGSEGV crash.

maximusgrey on 9 Nov 2017

That doesn't look right. You're going to need to follow the doc that NVIDIA
provides about that...

saudet on 9 Nov 2017

NVIDIA doc says you have to make this struct:

typedef struct CUDA_LAUNCH_PARAMS_st {
CUfunction function; /**< Kernel to launch */

unsigned int gridDimX;       /**< Width of grid in blocks */

unsigned int gridDimY;       /**< Height of grid in blocks */

unsigned int gridDimZ;       /**< Depth of grid in blocks */

unsigned int blockDimX;      /**< X dimension of each thread block */

unsigned int blockDimY;      /**< Y dimension of each thread block */

unsigned int blockDimZ;      /**< Z dimension of each thread block */

unsigned int sharedMemBytes; /**< Dynamic shared-memory size per thread block in bytes */

CUstream hStream;            /**< Stream identifier */

void **kernelParams;         /**< Array of pointers to kernel parameters */

} CUDA_LAUNCH_PARAMS;`

So I want to set the "void **kernelParams;" pointer. However the cuda. java code only provides these options:

'public native Pointer kernelParams(int i);
public native CUDA_LAUNCH_PARAMS kernelParams(int i, Pointer kernelParams);
@MemberGetter public native @Cast("void**") PointerPointer kernelParams();'

So how should I proceed?

maximusgrey on 9 Nov 2017

You'll need to allocate your own PointerPointer and pass that...

saudet on 9 Nov 2017

Like this? All variants give a SIGSEGV
launchParams.kernelParams(0, new PointerPointer(new IntPointer(new int[1])));
launchParams.kernelParams(0, new PointerPointer(new Pointer()));
launchParams.kernelParams(0, new PointerPointer(new Pointer[] { new Pointer() }));
launchParams.kernelParams(0, new PointerPointer(new Pointer[] { new IntPointer(new int[1]) }));

I would also think that kernelParams(0, pointer) would suggest a normal pointer and when returning the entire array with kernelParams() then I would get a PointerPointer back?

maximusgrey on 9 Nov 2017

That is indeed an issue. We'll have to fix this.

saudet on 9 Nov 2017

👍1

In the meantime, we can work around that by using Loader.sizeof(CUDA_LAUNCH_PARAMS.class) and Loader.offsetof(CUDA_LAUNCH_PARAMS.class, "kernelParams") with new BytePointer(launchParams).putPointer(..., kernelParams).

saudet on 10 Nov 2017

Thanks for the feedback and yes it works!

maximusgrey on 10 Nov 2017

The fix is included in version 1.4, providing wrappers for CUDA 9.1 now though:
http://search.maven.org/#search%7Cga%7C1%7Cbytedeco%20cuda
Thanks for reporting and testing this out!

saudet on 17 Jan 2018

Was this page helpful?

0 / 5 - 0 ratings