Incubator-mxnet: [RoadMap] Legacy issue resolution before 1.0 release

Created on 3 Aug 2017 · 31Comments · Source: apache/incubator-mxnet

We are working on multiple new features and refactors (gluon, sparse, engine, etc) towards an 1.0 release. But there are also some legacy issues that needs to be resolved. Here is a list of issues I have noted. Feel free to raise new issues or contribute fixes.

@mli @tqchen @eric-haibin-lin @reminisce @asmushetzel @jermainewang @ptrendx

Basic

Sort out int32, int64, index_t, mx_uint etc.
Currently TShape uses int64_t but the front-end and back-end interface still use uint32_t for indices.
This need cleaning up and int64_t interface need to be exposed through a new set of CAPI. The general policy going forward should be to use int64_t for indices/size and int32_t for number of dimensions. Signed int should be used for function arguments unless there is a really strong reason to use unsigned.
Most indexing related interface should support negative indexing.
Deprecate mshadow.
Remove usage of mshadow template evaluations and replace with Kernel::Launch or hand written cpu/gpu kernels.
Verify type and shape support for operators.
Currently some operators don't support types other than fp32 for legacy reasons. Proper type support and/or documentation should be added.
Currently some operators have limit support for tensor ranks (maximum 5 dims). The limit needs to be increased or removed if possible.
move legacy operators to new nnvm registration.
Conv, FC, BN etc should be refactored into stateless operators and use thread_local to store cudnn tensor descriptors.
remove mkl experimental and use the new storage type interface.

Stretch goals

Support pybind11 or cython interface for faster python API.
Use variant type instead of string for operator argument parsing.

Roadmap

Source

piiswrong

Most helpful comment

+1 on higher-order gradients #5699

altosaar on 7 Sep 2017

👍5

All 31 comments

should check kAddTo support for operators

feiyulv on 3 Aug 2017

I saw a few complaints that MXNet doesn't support IDE code completion due to operator registration.
one solution is to generate an operator file instead of creating it everytime

piiswrong on 4 Aug 2017

👍2

Yes, an operator file (or otherwise) to support IDE code completion would be greatly welcomed.

jmacglashan on 4 Aug 2017

also need to change all the DType(expf()) etc to use math function with proper precision

piiswrong on 5 Aug 2017

Default epsilon in Symbol/NDArray batch norm are too large (1e-3). Gluon now uses 1e-5, which is more commonly used.

szha on 5 Aug 2017

kvstore has a new str interface, while the updater always uses int as the key, which is not consistent. https://github.com/apache/incubator-mxnet/blob/master/src/kvstore/kvstore_local.h#L83

eric-haibin-lin on 6 Aug 2017

I think the biggest feature mxnet lacks is the higher order gradients (see #5699). This is probably a fairly substantial feature, but is there any plan for this or Hessian-vector products for 1.0?

jmacglashan on 8 Aug 2017

👍3

For me the biggest feature mxnet lacks is consistent and full documentation and tutorials. Gluon tutorial seems to be pretty awesome (although still incomplete), but the rest of the API does not have such good treatment. It got even worse once you removed most examples from the website (even though I agree that they were not well explained).
From the technical and performance point of view MXNet is a great (and probably the best actually) but it's hard to take off when others have lower barrier of entry and spend a lot on PR.

ptrendx on 8 Aug 2017

Should enable multiple times resource requests

ZihengJiang on 9 Aug 2017

@ptrendx @madjam @bhavinthaker The removed tutorials need to be brought back ASAP!

piiswrong on 14 Aug 2017

Should we also work on error handling? Basically getting more useful and more consistent messages when a model not build correctly by the user (shape inference fails etc).

asmushetzel on 16 Aug 2017

Ops that are differentiable are missing gradients. (e.g. 'norm')

szha on 24 Aug 2017

👍1

+1 on higher-order gradients #5699

altosaar on 7 Sep 2017

👍5

Create appropriate namespaces so that APIs are grouped logically and do not end up with prefix qualifiers such as linalg_ , random_ etc.

madjam on 7 Sep 2017

@madjam this is already worked on by @reminisce and @eric-haibin-lin

szha on 7 Sep 2017

@szha thanks. Is it being tracked in a separate issue?

madjam on 7 Sep 2017

@madjam I think it's already merged.

szha on 7 Sep 2017

@madjam Namespace refactoring is covered in this PR. https://github.com/apache/incubator-mxnet/pull/7604
@eric-haibin-lin may have more coverage for documentation.

reminisce on 7 Sep 2017

@madjam the docs for separate namespace is merged in #7712
@piiswrong could you update the task status so that ppl are aware which ones have been assigned / done?

eric-haibin-lin on 7 Sep 2017

Embedding op should be optimized for large sparse id. Now, the embedding layer use the input id as the raw index of embedding matrix. In some circumstance, id may be generated using uint64 hash so not suitable. This feature is much needed in industrial click through rate prediction, recommendation system and other uses.
Maybe, embedding matrix should be like this and partitioned to the server nodes of ps using sparse_id like that of tensorflow.

sparse_id1 vector
sparse_id2 vector
...
...

formath on 13 Sep 2017

@formath you bring up a good point. Large indices is definitely a feature we want to support in the long-term. We might want to open a separate issue and discuss this.

First of all, we do plan to add sparse support for Embedding op, where the weight can be in row_sparse format, and the gradient for the weight should be generated in row_sparse format, too. I am currently working on code refactoring and documentations so this sparse operator is not implemented yet.

Regarding large indices up to 64 bits, this requires the first task @piiswrong brought up regarding int types in the C API, and the Kernel::Launch API in the backend uses 32-bit int instead of 64-bit, which is problematic for many operators which operate on ndarrays of large shape. So the scope is bigger than just the embedding op and definitely, it takes some more time to resolve.

Are you working on any industrial scale dataset? Two ways to circumvent the 64-bit hashed-index problem in my mind:

rehashing the indices into around 23 or 24 bit to reduce the dimensionality, which doesn't hurt much as claimed by some paper, and doesn't cause the operator to break in MXNet.
preprocessing the dataset to find out the number of unique features and map them to continuous indices instead.
@formath what's your thought on this?

eric-haibin-lin on 13 Sep 2017

@eric-haibin-lin Both ok. But it does not solve the efficiency problem when the raw of embedding matrix is several millions or even billions because of the lack of sparse update. Those problems are the primary limits to use mxnet in industry. The sparse tensor support developed recently is a big progress. I think it and its mating part should be assigned a higher priority.

formath on 14 Sep 2017

@formath Yes, I'll work on the sparse embedding operator to support at least millions of features after I am done with basic documentations for sparse. We do have a few sparse optimizers like SGD and Adam. Ftrl and Adagrad are coming in #7720 #7903

eric-haibin-lin on 15 Sep 2017

It would be easier if this issue is converted to a github project so that item progresses can be tracked.

szha on 21 Sep 2017

I have the impression that many ops don't respect grad_req.

szha on 23 Sep 2017

Many examples are outdated or don't uphold the style standard. Duplicates of the same or similar (most popular being MNIST dataset) are omnipresent.

szha on 27 Sep 2017

Certain convolution layouts on CPU are not supported though API claims them to be supported (e.g. NWC NHWC NDHWC).

szha on 30 Sep 2017

All examples should be runnable. We should have a check list for these

eric-haibin-lin on 1 Oct 2017

2944 may have other open issues.

szha on 2 Oct 2017

@szha I'm wondering the same thing: the Convolution op explicitly does not support "NWC", for examlpe, but gluon mentions "NWC" in the docs. Searching the codebase shows that string only occurs in the high-level docs, so are the gluon docs simply wrong here?