Incubator-mxnet: [Discussion] 1.5.0 Roadmap

Created on 4 Apr 2019  路  32Comments  路  Source: apache/incubator-mxnet

Let's start a discussion here about the roadmap towards 1.5.0. We are looking for:

  • New features that are useful to your research and development.
  • Improvements and patches to existing features.

If you have any item that you'd like to propose to have in the roadmap, please do:

  • Create (or locate existing) issue/pull request for the item, note the issue/pull request number.
  • Comment in this issue: 1) the above issue number, 2) one sentence of what the item is about and why it's useful to you.
  • Indicate whether you'd be willing to help out on the item.
  • Share the ETA if you're driving the item and have an guesstimate on when it will be done.

cc @apache/mxnet-committers

Call for Contribution Roadmap

Most helpful comment

MKLDNN Quantization PR

Name | PR# | status
-- | -- | --
sum | #14614 | DONE
relu | #14604 | DONE
refactor requantize | #14608 | DONE
improve quantize | #14641 | DONE
conv + activation | #14819 | DONE
cache op | #14785, #14931 | DONE
quantization flow to support 0 shape (RNN, concat) | #15031 | DONE
New models (SSD COCO/RN18/MobileNet v2) | #14646, #14823 | DONE

FP32 optimization

Name | PR# | status
-- | -- | --
data loader for CPU | #14824 | DONE
transpose | #14545 | DONE
RNN refactor with NNVM | #14476 | DONE
reshape enhance | #14903 | DONE
sum1d | #14914 | DONE
softmax 1d | #14818 | DONE
MKL Math (ERF, mean, etc) | #14893 | DONE
MKLDNN RNN (vRNN,LSTM) | #14713 | DONE
Build (Window/Linux) | #14740, #14743, https://github.com/dmlc/mshadow/pull/374, #14829 #14877 | DONE
Update MKLDNN to 0.19 | #14783 | DONE

Documentations

Name | PR# | status
--|--|--
Windows Build Instruction | #14952 | DONE
MKLDNN OP | #14891 | DONE

All 32 comments

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Feature

The changes since 1.4.0 release that are already merged in the master branch will be included in the 1.5.0 release. The list can be found at: https://github.com/apache/incubator-mxnet/compare/v1.4.x...master?expand=1

Hi everyone, I've created v1.5.x branch here: https://github.com/apache/incubator-mxnet/tree/v1.5.x
Before we have an agreement on the timeline and features, I will synchronize this branch with the master branch periodically. Once we have decided the code freeze day, we will only cherry-pick required changes/features to the branch then.

Thanks for starting this! I would like to include exception handling fixes: #14397 (@anirudh2290), #14433(@anirudh2290) , #14575 (@arcadiaphy). These three should be merged by end of next week hopefully. Conversion of FP32 models to mixed precision models (#14584) (Should be in by May first week tentatively). In addition, I have some changes to profiler to visualize gpu memory pooling and help make better decisions on the env variable choice. It is currently in a branch (https://github.com/anirudh2290/mxnet/tree/memory_profiler_poc2) and intend to open a PR soon (next week).

MKLDNN Quantization PR

Name | PR# | status
-- | -- | --
sum | #14614 | DONE
relu | #14604 | DONE
refactor requantize | #14608 | DONE
improve quantize | #14641 | DONE
conv + activation | #14819 | DONE
cache op | #14785, #14931 | DONE
quantization flow to support 0 shape (RNN, concat) | #15031 | DONE
New models (SSD COCO/RN18/MobileNet v2) | #14646, #14823 | DONE

FP32 optimization

Name | PR# | status
-- | -- | --
data loader for CPU | #14824 | DONE
transpose | #14545 | DONE
RNN refactor with NNVM | #14476 | DONE
reshape enhance | #14903 | DONE
sum1d | #14914 | DONE
softmax 1d | #14818 | DONE
MKL Math (ERF, mean, etc) | #14893 | DONE
MKLDNN RNN (vRNN,LSTM) | #14713 | DONE
Build (Window/Linux) | #14740, #14743, https://github.com/dmlc/mshadow/pull/374, #14829 #14877 | DONE
Update MKLDNN to 0.19 | #14783 | DONE

Documentations

Name | PR# | status
--|--|--
Windows Build Instruction | #14952 | DONE
MKLDNN OP | #14891 | DONE

Some users pointed out useful features around matrix inversions, determinants, log determinants. I propose to add some small features to make these calculations easier: https://issues.apache.org/jira/projects/MXNET/issues/MXNET-1350?filter=allissues .

https://github.com/apache/incubator-mxnet/issues/14360

Comment in this issue: These are relevant calculations and some adjustments to the existing tools would help newcomers more easily leverage the existing work.

I'm interested and willing to implement this feature.

I'm quite busy at the moment but can likely finish this over a few days before mid May.

Thoughts?

Easily the biggest feature Mxnet is lacking is the higher order gradient support. There appears to be some work to get this going, but it's been a bit stagnant. The lack of strong support for this feature prohibits the ability to implement a number of DL algorithms. Everything beyond this seems like quality of life features. I would offer to help on this front, but I won't have the time necessary to work it out. I list it here in hopes that others will answer the call.

Beyond that, I think having dynamic shape in symbols would be a nice feature.

On a smaller scale, I think it would be nice if Gluon had support for blocks that operate on keyword arguments. It's pretty easy to add support for that in a non-breaking way (and I've done it in my own projects), but ideally this feature would be supported in other code like the data loader, which currently is fairly structure around assuming tuples rather than dicts (which would pair with keyword args).

A nitpick that I have is that when it comes to serialization, Mxnet (python) seems to assume you always want to write to a file in that it requests a path to a file to serialize the data. This often isn't appropriate in production systems. It would be much nicer if Mxnet simply took a file-like object or just returned bytes so you can do what you want with it.

Features I'd like to see for 1.5 include:

Amp if ready

New TensorRT integration with subgraph API support and FP16

NVTX ranges for easier GPU profiling.

+1 @pengzhao-intel on MKLDNN work. I'd love to make use of these optimizations. +1 to @anirudh2290's 3 very useful improvements.

Any plan to simplify compilation process on windows?
Any document to show us how to compile mxnet with the support of mkldnn on windows?

Any plan to simplify compilation process on windows?
Any document to show us how to compile mxnet with the support of mkldnn on windows?

Yes, we have the plan for MKLDNN on windows and will fix it in 1.5. I will add into my table.
@yinghu5 @NeoZhangJianyu

Update parameters manually in training loop.
https://github.com/apache/incubator-mxnet/issues/14735

I'd like https://github.com/apache/incubator-mxnet/pull/14869 to go in, estimated time to complete 05/10

Dependency Updates PR

14950 Update CI to use latest cuDNN & fix the ARCH_MISMATCH error on m60 gpu

14887 CUDA 10.1 PyPi script

14588 Update the numpy version

I desperately need higher order differentiation. Plz make it possible. Thanks to everyone for all your contributions so far.

@mouryarishik @jmacglashan Hi, about higher order gradients, @apeforest and @larroy are actively working on this and will be first available in the master branch and nightly pip install packages. Unfortunately, it won't make it to 1.5.0 as we plan to release soon. Stay tuned, thanks!

Should we formally deprecate amalgamation as all it does is lead people down a dead end?

@aaronmarkham is it broken?

@aaronmarkham so does there exist a tutorial to illustrate how to get libmxnet.so for mobile devices?

@szha My understanding is that it doesn't work. There are several open issues about it, but I haven't tried it out yet myself.
@kohillyang I'd love to see a guide for this using a recent build of MXNet. The closest we have is the amalgamation guide:
https://mxnet.incubator.apache.org/versions/master/faq/smart_device.html
If you try it out, please keep me posted - I'd be happy to get the guide updated with tips on getting it to work.

@aaronmarkham that sounds like something that needs fixing. not sure if it's enough reason to kill it though

Wouldn't it be better to have a preprocessor flag to achieve the same result? Cross compilation is solved.

@mouryarishik could you give details about your usecase? Thanks.

@larroy A lot of GAN models require 2nd order gradients for stabilised training.

Would it be possible to fix this gluon serialization/deserialization bug #12795 in the 1.5 release?

It has been opened for a long time (still not working in 1.4.1) and makes it hard to serialize gluon graphs for some applications e.g. in gluon-ts.

@mouryarishik We've already have a few operators to support higher order gradient:

elemwise_mul, log, log10, relu, FullyConnected, sin, cos, exp

However, due to the current design of NNVM, the graph data structure to model the computation graph, the support of higher order gradient in operators has to be implemented one by one (good news is that moving to NNVM 2.0 in the near future higher order gradient in operators will be supported automatically by NNVM).

In the meantime before NNVM is upgraded to 2.0, we plan to support higher order gradient in a limited number of operators. It would be great if you could identify a set of operators that are used in your model and require higher order gradient support. We will prioritize implementation for those operators.

Thanks for your continuous support and passion for MXNet.

I guess depends on the GAN, as you could have any layer, so if you want to use GAN with convs you need higher order for conv...

@vafl duplicate name issue should have been fixed already.

@szha In 1.4.1 the issue is still there (see the reproducible example in #12795 ). When you reuse any layer in a gluon graph, the graph cannot be serialized and loaded anymore. You have to explicitly create a new layer and share the parameters.

I think what was pushed is a workaround of this issue for RNNs.

@vafl yes what I meant is that 1.5.0 will include the fix. If you use the nightly package of mxnet you will see that the included code example is passing correctly.

Adding a shape property to mxnet symbol would be great.

@szha there're already lots of great proposals from the community.
I think we need to create a new topic for 1.6 roadmap :)

See #15589 for the new roadmap discussion.

Was this page helpful?
0 / 5 - 0 ratings