Cupy: Creating a cupy device array from GPU Pointer

Created on 8 Feb 2021  路  3Comments  路  Source: cupy/cupy

Hi,

we develop a GPU-accelerated C++ code with Python bindings called WarpX.

If we are compiling for CPU, we currently expose our memory for user-side manipulation as numpy arrays, adopting the existing, C++-side managed memory directly via numpy.frombuffer [code]:

        PyBUF_WRITE = 0x200
        buffer_from_memory = ctypes.pythonapi.PyMemoryView_FromMemory
        buffer_from_memory.argtypes = (ctypes.c_void_p, ctypes.c_int, ctypes.c_int)
        buffer_from_memory.restype = ctypes.py_object
        buf = buffer_from_memory(pointer, dtype.itemsize*size, PyBUF_WRITE)

Looking at the cupy _generate functions and cupy.ndarray constructors, it is not immediately clear to us how we can create a non-owning cupy array in the same way. Do you have any hints/workflows on that? We found that there is no frombuffer equivalent in cupy.

We saw #3431 for mmaping a host-side binary into a device-side cupy array but are not fully sure if this is similar enough.

Alternatively, we currently considered creating a numpy array that points to a device-side pointer for its data, a strategy that worked for us for GPU-GPU code coupling as a wrapper before, but now wonder if we can make this a proper cupy array with functions like cupy.asarray(numpy_array_w_device_ptr). Would this adopt or copy out the data and should we add further flags to describe our GPU memory (ownership)?

Thank you already for your feedback!

issue-checked

Most helpful comment

There's a UnownedMemory class that can wrap a foreign device pointer for creating a CuPy array on top of it, if it's what you're asking for. Check it out how CuPy talks to other libraries through the CUDA Array Interface:
https://github.com/cupy/cupy/blob/a5b24f91d4d77fa03e6a4dd2ac954ff9a04e21f4/cupy/core/core.pyx#L2478-L2514
ptr there is the pointer address (cast to intptr_t, which is also representable as a plain Python int). A nicer thing you can do is to support __cuda_array_interface__ in your Python layer 馃檪 Let me know if this helps.

All 3 comments

There's a UnownedMemory class that can wrap a foreign device pointer for creating a CuPy array on top of it, if it's what you're asking for. Check it out how CuPy talks to other libraries through the CUDA Array Interface:
https://github.com/cupy/cupy/blob/a5b24f91d4d77fa03e6a4dd2ac954ff9a04e21f4/cupy/core/core.pyx#L2478-L2514
ptr there is the pointer address (cast to intptr_t, which is also representable as a plain Python int). A nicer thing you can do is to support __cuda_array_interface__ in your Python layer 馃檪 Let me know if this helps.

Thank you for the advice, this looks great and is exactly what we were looking for.

We will then implement the CUDA Array Interface (v3) through pybind11 in AMReX and downstream codes.

I read in the docs that there is no convention on the C-side yet, I wonder if it makes sense to follow one for general pybind11 helpers. Here is how they look for the array interface / buffer protocol: Py / C++

I read in the docs that there is no convention on the C-side yet, I wonder if it makes sense to follow one for general pybind11 helpers. Here is how they look for the array interface / buffer protocol: Py / C++

Hi @ax3l, sorry I dropped the ball. So, yes, __cuda_array_interface__ is a Python-only protocol. If you already support buffer protocol (at the C level) and you'd like to incorporate __cuda_array_interface__, we actually did this in mpi4py:
https://github.com/mpi4py/mpi4py/blob/28eb557182bed368701165f99788bd2497d59f42/src/mpi4py/MPI/asgpubuf.pxi#L59
So if you check the function Py_GetCUDABuffer there, what we did was to parse the content in __cuda_array_interface__ and convert it to a Python buffer object (which we already supported since very early days of mpi4py) to use the existing infrastructure. Not sure if this is useful to you, though.

Another option is to support DLPack. It exposes some C structs all the way up to the Python level. Check out how we support it in CuPy here: https://github.com/cupy/cupy/blob/master/cupy/core/dlpack.pyx This exchange protocol will be part of the upcoming Python Array API standard: https://data-apis.github.io/array-api/latest/design_topics/data_interchange.html
The good thing of the DLPack protocol is that it aims to support a wide spectrum of architectures (CPU, NVIDIA GPU, AMD GPU, Intel GPU, etc), but the bad thing IMHO is it's a bit more involved than __cuda_array_interface__.

Was this page helpful?
0 / 5 - 0 ratings