Incubator-mxnet: Feature request: share memory between numpy.array and mxnet.ndarray

Created on 24 Feb 2019  路  17Comments  路  Source: apache/incubator-mxnet

Last time I did a benchmark and @szha pointed out that

MXNet made the choice of not doing a zero-copy for numpy arrays, but instead making a copy of the numpy data.

This is a safe choice but sometimes it can be a performance problem. I check the C API and find no function to share data from outside. Is it possible for MXNet to provide such api?

Feature request NDArray

Most helpful comment

I think adding a parameter to the existing API may introduce certain level of ambiguity which is not desirable. For example, mx.nd.array takes not just numpy arrays as arguments, but shared=True only works for numpy arrays. We can consider adding this parameter to a new API, for example: mx.nd.from_numpy(zero_copy=True), and it falls back to mx.nd.array when zero_copy is False.

All 17 comments

I will take a look this weekend.

Hey, I did some research this weekend and am convinced it might be possible. So is it acceptable if we assume we are using cython (instead of ctypes) @szha

@junrushao1994 could you elaborate on the consideration?

The point is that we need transfer/share ownership of numpy's ndarray to mxnet's C++ backend, because we cannot guarantee the frontend object to exist forever.

Therefore, we need a customized deleter of MXNet's NDArray, aka calling something Py_DECREF of the numpy's ndarray object from the C++ backend - It is possible to implement the deleter via ctypes or cython, and then pass it to something roughly like follows this chunk of code.

https://github.com/apache/incubator-mxnet/blob/0f88f61379bd5f59fff6b825be1507d020bf2b7e/include/mxnet/ndarray.h#L131-L148

Per private discussion with @reminisce, his concern is how this could be compatible with the executor and memory planning, in which the executor may take over the ownership. I am not sure about this.

For the sharing from numpy to NDArray, we should use ctypes or weakref module add the inference of numpy object, and decrease the inference through NDArray::deleter.

For the sharing from NDArray to numpy, I think we can add a deleter attribute for numpy object.

https://docs.scipy.org/doc/numpy/user/basics.subclassing.html?highlight=deleter

Agreed with @wkcn

I prototyped a version that supports zero copy from numpy to DLManagedTensor in dlpack 0.2, but it turns out that MXNet hasn鈥檛 support dlpack 0.2 yet...

We can update the submodule dlpack.

@wkcn I opened a PR about this to https://github.com/dmlc/dlpack/pull/38.

@wkcn Sorry I made a mistake. @reminisce reminded me that we already got dlpack 0.2 in MXNet, so everything should be fine

In the submodule DLPack, DLPACK_VERSION is 010 rather than 020.

@wkcn this is because dlpack forgots to change its version number to 020 when releasing v0.2 lol. I check the commit hash here, and it is identical to tag v0.2.

Being lazy for a while, and now I am thinking of adding an API to mxnet. @wkcn @szha @SunDoge What do you guys think of names for this API? mx.nd.zerocopy_from() or anything else?

I think it is suitable to add a new argument.

NDArray to NumPy

a = mx.nd.array([1,2,3])
b = a.asnumpy(shared=True)

NumPy to NDArray

c = np.array([4,5])
d = mx.nd.array(c, shared=True)

I think adding a parameter to the existing API may introduce certain level of ambiguity which is not desirable. For example, mx.nd.array takes not just numpy arrays as arguments, but shared=True only works for numpy arrays. We can consider adding this parameter to a new API, for example: mx.nd.from_numpy(zero_copy=True), and it falls back to mx.nd.array when zero_copy is False.

@reminisce Sounds good. Will do!

Thx.

Was this page helpful?
0 / 5 - 0 ratings