Cupy: Creating a cupy device array from GPU Pointer

Created on 8 Feb 2021 · 3Comments · Source: cupy/cupy

Hi,

we develop a GPU-accelerated C++ code with Python bindings called WarpX.

If we are compiling for CPU, we currently expose our memory for user-side manipulation as numpy arrays, adopting the existing, C++-side managed memory directly via numpy.frombuffer [code]:

        PyBUF_WRITE = 0x200
        buffer_from_memory = ctypes.pythonapi.PyMemoryView_FromMemory
        buffer_from_memory.argtypes = (ctypes.c_void_p, ctypes.c_int, ctypes.c_int)
        buffer_from_memory.restype = ctypes.py_object
        buf = buffer_from_memory(pointer, dtype.itemsize*size, PyBUF_WRITE)

Looking at the cupy _generate functions and cupy.ndarray constructors, it is not immediately clear to us how we can create a non-owning cupy array in the same way. Do you have any hints/workflows on that? We found that there is no frombuffer equivalent in cupy.

We saw #3431 for mmaping a host-side binary into a device-side cupy array but are not fully sure if this is similar enough.

Alternatively, we currently considered creating a numpy array that points to a device-side pointer for its data, a strategy that worked for us for GPU-GPU code coupling as a wrapper before, but now wonder if we can make this a proper cupy array with functions like cupy.asarray(numpy_array_w_device_ptr). Would this adopt or copy out the data and should we add further flags to describe our GPU memory (ownership)?

Thank you already for your feedback!

issue-checked

Source

ax3l

Most helpful comment

There's a UnownedMemory class that can wrap a foreign device pointer for creating a CuPy array on top of it, if it's what you're asking for. Check it out how CuPy talks to other libraries through the CUDA Array Interface:
https://github.com/cupy/cupy/blob/a5b24f91d4d77fa03e6a4dd2ac954ff9a04e21f4/cupy/core/core.pyx#L2478-L2514
ptr there is the pointer address (cast to intptr_t, which is also representable as a plain Python int). A nicer thing you can do is to support __cuda_array_interface__ in your Python layer 🙂 Let me know if this helps.

leofang on 8 Feb 2021

👍2

All 3 comments

leofang on 8 Feb 2021

👍2

Thank you for the advice, this looks great and is exactly what we were looking for.

We will then implement the CUDA Array Interface (v3) through pybind11 in AMReX and downstream codes.

I read in the docs that there is no convention on the C-side yet, I wonder if it makes sense to follow one for general pybind11 helpers. Here is how they look for the array interface / buffer protocol: Py / C++

ax3l on 14 Feb 2021

👍1

I read in the docs that there is no convention on the C-side yet, I wonder if it makes sense to follow one for general pybind11 helpers. Here is how they look for the array interface / buffer protocol: Py / C++

Hi @ax3l, sorry I dropped the ball. So, yes, __cuda_array_interface__ is a Python-only protocol. If you already support buffer protocol (at the C level) and you'd like to incorporate __cuda_array_interface__, we actually did this in mpi4py:
https://github.com/mpi4py/mpi4py/blob/28eb557182bed368701165f99788bd2497d59f42/src/mpi4py/MPI/asgpubuf.pxi#L59
So if you check the function Py_GetCUDABuffer there, what we did was to parse the content in __cuda_array_interface__ and convert it to a Python buffer object (which we already supported since very early days of mpi4py) to use the existing infrastructure. Not sure if this is useful to you, though.

Another option is to support DLPack. It exposes some C structs all the way up to the Python level. Check out how we support it in CuPy here: https://github.com/cupy/cupy/blob/master/cupy/core/dlpack.pyx This exchange protocol will be part of the upcoming Python Array API standard: https://data-apis.github.io/array-api/latest/design_topics/data_interchange.html
The good thing of the DLPack protocol is that it aims to support a wide spectrum of architectures (CPU, NVIDIA GPU, AMD GPU, Intel GPU, etc), but the bad thing IMHO is it's a bit more involved than __cuda_array_interface__.

leofang on 3 Mar 2021

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Nothing works, with error: CUDADriverError: CUDA_ERROR_INVALID_PTX: a PTX JIT compilation failed

KShutter · 3Comments

`cupyx.scipy.fft.{hfft*, ihfft*}` ignored the `overwrite_x` argument

leofang · 3Comments

Swapping values in array yields incomplete result

jakirkham · 4Comments

Read past end of array in percentile kernel can result in incorrect 100th quantile value

wphicks · 3Comments

Implementing `tofile` and `fromfile` to use GPUDirect

jakirkham · 4Comments