We are working on multiple new features and refactors (gluon, sparse, engine, etc) towards an 1.0 release. But there are also some legacy issues that needs to be resolved. Here is a list of issues I have noted. Feel free to raise new issues or contribute fixes.
@mli @tqchen @eric-haibin-lin @reminisce @asmushetzel @jermainewang @ptrendx
should check kAddTo support for operators
I saw a few complaints that MXNet doesn't support IDE code completion due to operator registration.
one solution is to generate an operator file instead of creating it everytime
Yes, an operator file (or otherwise) to support IDE code completion would be greatly welcomed.
also need to change all the DType(expf()) etc to use math function with proper precision
Default epsilon in Symbol/NDArray batch norm are too large (1e-3). Gluon now uses 1e-5, which is more commonly used.
kvstore has a new str interface, while the updater always uses int as the key, which is not consistent. https://github.com/apache/incubator-mxnet/blob/master/src/kvstore/kvstore_local.h#L83
I think the biggest feature mxnet lacks is the higher order gradients (see #5699). This is probably a fairly substantial feature, but is there any plan for this or Hessian-vector products for 1.0?
For me the biggest feature mxnet lacks is consistent and full documentation and tutorials. Gluon tutorial seems to be pretty awesome (although still incomplete), but the rest of the API does not have such good treatment. It got even worse once you removed most examples from the website (even though I agree that they were not well explained).
From the technical and performance point of view MXNet is a great (and probably the best actually) but it's hard to take off when others have lower barrier of entry and spend a lot on PR.
Should enable multiple times resource requests
@ptrendx @madjam @bhavinthaker The removed tutorials need to be brought back ASAP!
Should we also work on error handling? Basically getting more useful and more consistent messages when a model not build correctly by the user (shape inference fails etc).
Ops that are differentiable are missing gradients. (e.g. 'norm')
+1 on higher-order gradients #5699
Create appropriate namespaces so that APIs are grouped logically and do not end up with prefix qualifiers such as linalg_ , random_ etc.
@madjam this is already worked on by @reminisce and @eric-haibin-lin
@szha thanks. Is it being tracked in a separate issue?
@madjam I think it's already merged.
@madjam Namespace refactoring is covered in this PR. https://github.com/apache/incubator-mxnet/pull/7604
@eric-haibin-lin may have more coverage for documentation.
@madjam the docs for separate namespace is merged in #7712
@piiswrong could you update the task status so that ppl are aware which ones have been assigned / done?
Embedding op should be optimized for large sparse id. Now, the embedding layer use the input id as the raw index of embedding matrix. In some circumstance, id may be generated using uint64 hash so not suitable. This feature is much needed in industrial click through rate prediction, recommendation system and other uses.
Maybe, embedding matrix should be like this and partitioned to the server nodes of ps using sparse_id like that of tensorflow.
sparse_id1 vector
sparse_id2 vector
...
...
@formath you bring up a good point. Large indices is definitely a feature we want to support in the long-term. We might want to open a separate issue and discuss this.
First of all, we do plan to add sparse support for Embedding op, where the weight can be in row_sparse format, and the gradient for the weight should be generated in row_sparse format, too. I am currently working on code refactoring and documentations so this sparse operator is not implemented yet.
Regarding large indices up to 64 bits, this requires the first task @piiswrong brought up regarding int types in the C API, and the Kernel::Launch API in the backend uses 32-bit int instead of 64-bit, which is problematic for many operators which operate on ndarrays of large shape. So the scope is bigger than just the embedding op and definitely, it takes some more time to resolve.
Are you working on any industrial scale dataset? Two ways to circumvent the 64-bit hashed-index problem in my mind:
@eric-haibin-lin Both ok. But it does not solve the efficiency problem when the raw of embedding matrix is several millions or even billions because of the lack of sparse update. Those problems are the primary limits to use mxnet in industry. The sparse tensor support developed recently is a big progress. I think it and its mating part should be assigned a higher priority.
It would be easier if this issue is converted to a github project so that item progresses can be tracked.
I have the impression that many ops don't respect grad_req.
Many examples are outdated or don't uphold the style standard. Duplicates of the same or similar (most popular being MNIST dataset) are omnipresent.
Certain convolution layouts on CPU are not supported though API claims them to be supported (e.g. NWC NHWC NDHWC).
All examples should be runnable. We should have a check list for these
@szha I'm wondering the same thing: the Convolution op explicitly does not support "NWC", for examlpe, but gluon mentions "NWC" in the docs. Searching the codebase shows that string only occurs in the high-level docs, so are the gluon docs simply wrong here?
@szha I met same issue as @taliesinb did using conv1d in mxnet(mxnet-cu80 (1.0.0.post2)
). The document is not matched with conv1d behavior.
Most helpful comment
+1 on higher-order gradients #5699