Today deep learning scientists spend majority of their time on data processing, debugging tensor algorithms, and tuning model parameters, instead of architecting models from scratch by themselves as a result from the abundant pre-trained models existing in many deep learning model zoos. This has highlighted the usability of tensor APIs as a key factor for a framework to be widely adopted.
MXNet was firstly designed with the focus on memory efficiency, computation throughput and scalability. The usability problems begin to show up nowadays when more and more models demonstrate dynamic natures, e.g. unknown-shape tensors before runtime, control flow depending on a runtime result, etc. Here we highlight the most frequent complaints about usability from users.
a = [0, 1, 2]
, a[1]
will generate an NDArray
of shape (1,)
, instead of ()
as in NumPy.(0, 16, 256)
cannot be passed to an operator, because our system currently treats 0, the first dimension size, as unknown, rather than a concrete number.nd.dot
vs. np.dot
, nd.concatenate
vs. np.concatenate
, etc.data[data < 0]
cannot run.mxnet.ndarray
and mxnet.symbol
.for
, while
, if/else
, etc.def sum(state, i):
s = state + data[i]
return s, [s, i + 1]
def sum_cond(state, i):
return i < 4
out, state = F.contrib.while_loop(sum_cond, sum, [F.zeros((1)), F.zeros((1))],
max_iterations=5)
Instead, users should be able to just write native Python code as the following and if required, let the framework serialize it into a computation graph for optimization and deployment.
data = np.arange(5)
out = 0
i = 0
while i < 5:
out = out + data[i]
It is not hard to figure out that all of the above pain points can be summarized as a result from lack of NumPy-compatible coding experience in MXNet. While addressing the problems of better support of control flow operators and a consolidated coding style for writing imperative and symbolic code with more flexibility requires introducing fundamental changes into the codebase for building new infrastructures, such as a new graph IR and executor, which is extremely non-trivial and should be executed with a long-term plan, we can, at the moment, improve the usability by fixing the issue of zero-dim/size tensors and implementing NumPy operators in MXNet. Please allow us to discuss how to achieve these short-term goals in the following.
Zero-dim and zero-size tensors are valid tensors in NumPy. The former, whose shapes are ()
, represent scalars in numpy.ndarray
format. The latter, which have one or multiple zero dimension sizes in shapes, can be useful as a placeholder for many ndarray
operations, such as concatenating a zero-size ndarray
with another ndarray
. MXNet does not support them due to the reserved semantics of empty shape ()
and shapes with zero dimension sizes indicating unknown shape information. Such information need to be filled out during the shape inference stage in order to move forward to tensor computations later.
We can first change the current semantics to comply with NumPy definition.
ndim = 0
to ndim = -1
in TShape
class.dim_size = 0
to dim_size = -1
in TShape
class.After this, we need to scan all over the codebase to modify the code accordingly where shape.ndim() == 0
and shape.Size() == 0
is used to perform unknown shape checks.
Please note that although MXNet's shape is a type inheriting from nnvm::Tuple
, which is often used to represent an list-like object, such as axis=(1, 2, 3)
, we will not change the meaning of an empty tuple. This separation of definitions for empty shape and empty tuple keeps the their roles clearly decoupled.
We propose to breakdown the efforts into the following steps.
tuple.h
from NNVM to MXNet and rename nnvm::TShape
to mxnet::TShape
.nnvm::Tuple
and nnvm::TShape
are used with mxnet::Tuple
and mxnet::TShape
, respectively.TShape
in tuple.h
to use ndim = -1
to indicate unknown shapes and dim_size = -1
to indicate unknown shape dim sizes.ndim == 0
and dim_size == 0
is used to accommodate the above changes.InferShape
, PlanMemory
, and Gradient
, where nnvm::TShape
is used, to accommodate the above changes.By default, we do not change the original definition of output shapes in shape inference functions; we just change ndim==0
to ndim==-1
for unknown shape verification. No backward compatibility issues are expected for all but one case, NDArray
indexing. To elaborate, the current behavior determines that x[i]
always returns a tensor with ndim >= 1
. We can keep the current behavior unchanged and implement a global switch for users to turn on for expecting NumPy-compatible results.
Previous discussion of this topic can be seen here.
To address the problems of operator incompatibility with NumPy and alleviate the pain of diverged programming experience due to the operator namespace separation: mxnet.ndarray
and mxnet.symbol
, we propose creating a new namespace mxnet.numpy
, adopting operator APIs from NumPy, and implementing those operator APIs under the namespace. mxnet.numpy
should provide the same imperative programming experience as NumPy and will gradually replace all the non-neural-network operators in the current codebase. While implementing NumPy operators in MXNet, it is possible for us to leverage TVM to generate high-performance kernels (ref.).
mxnet.numpy
operators be used in Gluon for hybridization?The newly implemented NumPy operators can still be accessed through the module (ndarray
/symbol
) delegate F
in Gluon, e.g. F.numpy.dot
. This works because the new operators are still registered under mxnet.ndarray
and mxnet.symbol
behind the scene. It is just that users are encouraged to access NumPy operator APIs through mxnet.numpy
to write pure imperative code and Gluon APIs for achieving hybrid coding experience.
A dev branch has been opened for this proposal.
https://github.com/apache/incubator-mxnet/tree/numpy
@junrushao1994 @szha @eric-haibin-lin @zheng-da @yzhliu
Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Feature
+1 for this RFC.
Numpy compatibility has been long existing desire from both developers and users. It is very meaningful if we could make it possible.
+1 for this RFC.
The inconsistent APIs even within MXNet operators itself caused much confusing for users. It will be a great improvement in usability if we can make MXNet APIs compatible with Numpy.
I would suggest that we establish a formal review process for PRs that includes API change or addition to prevent from creating inconsistent APIs in the future.
+1 for this RFC.
I especially like the numpy namespace proposal, that will help cleaning up a lot of thing.
My experience is that the major blocker for numpy compatibility (and bad user experience) is due to the lack of dynamic shape inference. I cannot wait to have that out.
Anyways, since I wrote a handful of operators already I am very happy to lend a hand in getting fully numpy-compatible once dynamic shape inference is done.
+1 for handling zero-size arrays.
I'm not that concerned about numpy compatibility, but the lack of zero-size arrays is something that I would like to see fixed, since the current situation means that empty arrays have to be carefully padded to not cause any problems.
+1 for this RFC.
The consistent experience would also help JVM language binding to be in sync with Python. It reduce the bar for users familar with Python to write the same thing in Scala.
+1 for this RFC.
It will be more flexible to use MXNet, especially in slicing, and I hope mx.numpy could eliminate the divergence between mx.nd and mx.sym. : )
I wonder how to implement mx.numpy: using Python ast module to extract the abstract syntax tree then run them on JIT, or implement it on Python entirely? We should also focus on the deployment of mx.numpy.
I do not think F.numpy.dot is a good idea, since it is confusing that mx.numpy, mx.nd.numpy and mx.sym.numpy all exist. We only need mx.numpy to support mx.numpy.dot(a_nd, b_nd) and mx.numpy.dot(a_sym, b_sym).
@wkcn All of what you have said make sense. :)
Gluon APIs, GluonNLP and GluonCV highly depend on the current MXNet infra. So we have to execute it in an organized and steady stream in order not to break backward compatibility. Current NNVM has its own limitations in expressing dynamic shapes and control flow operators. We will eventually need a new IR (Relay is an option) to do AST transformation.
Thanks for the RFC!
It is just that users are encouraged to access NumPy operator APIs through
mxnet.numpy
to write pure imperative code and Gluon APIs for achieving hybrid coding experience.
Earlier mxnet.ndarray was supposed to give you the experience of writing pure imperative code. Why can't we add the operators under this namespace and make the interface changes for existing operators ? Is there a list of operators which have diverged APIs for numpy and ndarray and can it be timed with 2.0 release?
We can keep the current behavior unchanged and implement a global switch for users to turn on for expecting NumPy-compatible results.
If I understand correctly, even when using numpy namespace you need to toggle this switch(probably an env variable?) to obtain the correct slicing ? Have you also considered implementing a seperating numpy ndarray from base with specific functions for slicing like __getitem__
implemented to avoid using this switch.
@anirudh2290
Why can't we add the operators under this namespace and make the interface changes for existing operators ?
We can. However, there exist some operators in mxnet.ndarray whose names are the same as numpy counterparts while the behavior are slightly different, this means they cannot exist in the same namespace if we want to preserve backward compatibility. On the other hand, 2.0 is a good opportunity for fixing many of the existing problems besides the operator behaviors, so we'd likely want to take the time. Thus, to start now, having a new namespace would be the most straightforward way to go.
Have you also considered implementing a seperating numpy ndarray
Yes. Creating different array types means we'd start to see diverging user code, with some in ndarray and some in numpy ndarray, which would become harder to migrate later.
@reminisce @szha NumPy has reference/view and stride in its NDArray structure whille MXNet.NDArray doesn't have. How does this impact the design of NumPy-compatible coding experience?
@TaoLv In neural nets, once you do backprop, you cannot overwrite data because it destroys checkpointing.
Not sure I understand the checkpointing
. Can you explain a bit more? I think we have memory planning pass to decide whether the data can be overwritten? Also there are NumPy-based framework like Theano and Chainer.
@TaoLv MXNet can have the same concept as in NumPy for view with the implementation of strides. But I think it's not the first priority for us to do so, because they are rarely useful in training (maybe useful in data preprocessing). @junrushao1994 's point is that in-place assignment is invalid in BP as it will wipe out pre-stored autograd information. This is consistent with other DL frameworks.
We can. However, there exist some operators in mxnet.ndarray whose names are the same as numpy counterparts while the behavior are slightly different, this means they cannot exist in the same namespace if we want to preserve backward compatibility.
Do we really have to carry this burden of backward compatibility all the way beyond 2.0? I feel existing operators are confusing enough that 2.0 maybe a good time for us to make the API clean and easy to use. Would adding a new name space mx.numpy
to the existing mx.sym
and mx.ndarray
cause more confusion to new users?
@apeforest Because MXNet guarantees backward compatibility, those two namespaces have to be kept till 2.0. Adding namespace numpy
lowers the bar for data scientists from NumPy community to use the DL framework. As for the framework itself, the purpose is to deemphasize the difference between mxnet.symbol
and mxnet.ndarray
in this major release. To retire those two namespaces in 1.x.x, one practical thing in the future we can do is register all ops under the namespaces like numpy
, nn
, etc. with unified interfaces supporting both NDArray
and Symbol
arguments, and in Gluon, we can remove the second-level module delegate F.
.
@reminisce I am fine with keeping those two namespaces till 2.0 for backward compatibility. Starting from 2.0, I feel we may want to just drop mx.ndarray
, mx.symbol
and make mx.numpy
the only name space to users. I like the unified interface idea you propopsed.
+1 for this RFC.
What's the plan regarding: "Instead, users should be able to just write native Python code as the following and if required, let the framework serialize it into a computation graph for optimization and deployment." I would get the python AST and convert it to a computational graph, seems that part is not described into detail, I guess is a long-term phase.
This feature has been made available as experimental feature 1.6 and will be supported in 2.0. Thanks to everyone who contributed to this major feature
Most helpful comment
+1 for this RFC.
I especially like the numpy namespace proposal, that will help cleaning up a lot of thing.
My experience is that the major blocker for numpy compatibility (and bad user experience) is due to the lack of dynamic shape inference. I cannot wait to have that out.
Anyways, since I wrote a handful of operators already I am very happy to lend a hand in getting fully numpy-compatible once dynamic shape inference is done.