Incubator-mxnet: [Discussion] 1.5.1 Patch Release

Created on 19 Jul 2019  Â·  62Comments  Â·  Source: apache/incubator-mxnet

Let's start a discussion here about the known issues with 1.5.0 to put into a patch release.

Create (or locate existing) issue/pull request for the item, note the issue/pull request number.
Comment in this issue: 1) the above issue number, 2) one sentence of what the item is about and why it's important.
Indicate whether you'd be willing to help out on the item.
Share the ETA if you're driving the item and have an guesstimate on when it will be done.

cc @apache/mxnet-committers

Discussion

Most helpful comment

All 62 comments

Theres a problem with the WarpCTC plugin in 1.5.0 #15612. A PR has been created with a fix #15601

Possible issue with asnumpy(), but not able to reproduce yet https://github.com/apache/incubator-mxnet/issues/15431

license issues need to be fixed before next release https://github.com/apache/incubator-mxnet/issues/15542
MKL-DNN can be resolved in 1.6.0, Cub issue depends on Nvidia to release new version of Cub, other issue should be easy to resolve in 1.5.1.

@mxnet-label-bot add [discussion]

I will track Julia stuffs that need to be backported here:

Hi, I just created a label "Backport 1.5" (https://github.com/apache/incubator-mxnet/labels/Backport%201.5).
I think we can create a single PR to collecting the stuffs that need to be backported, use this label as a checklist, and keep that PR open until we want to make a release candidate.

I'd prefer if we use multiple PRs which only contain cherry-picks from master. That way, we don't squash and keep the references.

ah, okay

@samskalicky thanks for starting the discussion. @PatricZhao @TaoLv could you drive this release?

It's great to lead the release. @TaoLv @juliusshufan will be working on this.

@TaoLv Lets include #15692 in the 1.5.1 release to fix #15737

Thanks for reporting. Added it to https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Plan+and+Status

Hi, I’m fixing the bug of two C APIs MXEnginePushAsyncND and MXEnginePushSyncND
https://github.com/apache/incubator-mxnet/pull/15751

It will be finished this week.

Thanks for proposing, @wkcn .
Is this bug firstly introduced in the 1.5.0 release? Why do you think it should be included into the 1.5.1 patch release?

@TaoLv Yes, it is a bug in 1.5.0 release https://github.com/apache/incubator-mxnet/blob/v1.5.x/src/c_api/c_api.cc#L1537

It need to be fixed, otherwise users couldn't use these two C APIs in the release version of MXNet.

I mean does it also exist in v1.4.x releases?

@TaoLv It doesn't exist in v1.4.x release. The bug was introdueced firstly in v1.5.x release.

Thanks for the clarification. Do you have a github issue for that? I add it to the track list.

@wkcn, thanks for your report and contribution! The issue and your fix are added to https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Plan+and+Status.

I'd like to backport a few TensorRT patches we've contributed to master. No functional changes but they'll provide support for many additional models that would otherwise not be supported.

https://github.com/apache/incubator-mxnet/issues/15784 needs to be fixed in 1.5.1, big impact for simple_bind. The fix is in #15620. @TaoLv please include this too. Thanks!

we also need to update mshadow on 1.5.x branch https://github.com/apache/incubator-mxnet/pull/15600

I'd like to backport a few TensorRT patches we've contributed to master. No functional changes but they'll provide support for many additional models that would otherwise not be supported.

Thank you @KellenSunderland . Could you help to list them to the cwiki page?

15784 needs to be fixed in 1.5.1, big impact for simple_bind. The fix is in #15620. @TaoLv please include this too. Thanks!

Sure. I will do that.

we also need to update mshadow on 1.5.x branch #15600

@szha is this needed?

nightly test failure need to be fixed:

15374

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/detail/master/395/pipeline/

@roywei @lebeg Could you help to check if the issue has been fixed via https://github.com/apache/incubator-mxnet/pull/15452? If so, I will include the fix to the 1.5.1 patch release. Thanks!

we also need to update mshadow on 1.5.x branch #15600

@szha is this needed?

@TaoLv without this when you checkout the v1.5.x branch the 3rdparty/mshadow directory is empty and mxnet fails to compile. This is definitely needed.

we also need to update mshadow on 1.5.x branch #15600

@szha is this needed?

@TaoLv without this when you checkout the v1.5.x branch the 3rdparty/mshadow directory is empty and mxnet fails to compile. This is definitely needed.

Then you need backport this change to all of the release branches. Can this be mitigated by git submodule sync and git submodule update --recursive --init?
I'm afraid even if this change is picked to v1.5.x, users will get git complains when they try to pull the latest code if they are on a old commit.

What's the issue with MShadow? Did they have to do a force push / history re-write again? We may indeed have to backport that to other release branches.

@KellenSunderland mshadow source code is donated and merged to MXNet. See https://github.com/apache/incubator-mxnet/pull/15600 and https://github.com/dmlc/mshadow/issues/373. So it's not a git submodule any more.

For the source downloaded before mshadow merged into MXNet, users need to remove
the directory mshadow firstly, then git pull to update the code.

@wkcn that doesnt work for me:

ubuntu@ip-172-31-76-214:~$ git clone --recursive https://github.com/apache/incubator-mxnet.git mxnet2
Cloning into 'mxnet2'...

ubuntu@ip-172-31-76-214:~/mxnet2$ git checkout v1.5.x
M   3rdparty/dlpack
M   3rdparty/dmlc-core
M   3rdparty/mkldnn
M   3rdparty/tvm
Branch v1.5.x set up to track remote branch v1.5.x from origin.
Switched to a new branch 'v1.5.x'
ubuntu@ip-172-31-76-214:~/mxnet2$ cd 3rdparty/
ubuntu@ip-172-31-76-214:~/mxnet2/3rdparty$ ls
ctc_include  dlpack  dmlc-core  googletest  mkldnn  mshadow  nvidia_cub  onnx-tensorrt  openmp  ps-lite  tvm
ubuntu@ip-172-31-76-214:~/mxnet2/3rdparty$ ls mshadow/
ubuntu@ip-172-31-76-214:~/mxnet2/3rdparty$ rm -rf mshadow/
ubuntu@ip-172-31-76-214:~/mxnet2/3rdparty$ git pull
Already up-to-date.
ubuntu@ip-172-31-76-214:~/mxnet2/3rdparty$ ls
ctc_include  dlpack  dmlc-core  googletest  mkldnn  nvidia_cub  onnx-tensorrt  openmp  ps-lite  tvm
ubuntu@ip-172-31-76-214:~/mxnet2/3rdparty$ git submodule update --recursive
Submodule path 'dlpack': checked out '10892ac964f1af7c81aae145cd3fab78bbccd297'
Submodule path 'dmlc-core': checked out '3943914eed66470bd010df581e29e4dca4f7df6f'
Submodule path 'mkldnn': checked out '41bee20d7eb4a67feeeeb8d597b3598994eb1959'
Submodule path 'tvm': checked out '21935dcbf56ad3bd66ebff9891a6bc3865b8106d'
Submodule path 'tvm/3rdparty/dlpack': checked out '5c792cef3aee54ad8b7000111c9dc1797f327b59'

Submodule path 'tvm/3rdparty/dmlc-core': checked out '82bf4c2e2af312b3d52513aa727483803a2f8734'
ubuntu@ip-172-31-76-214:~/mxnet2/3rdparty$ 
ubuntu@ip-172-31-76-214:~/mxnet2/3rdparty$ ls
ctc_include  dlpack  dmlc-core  googletest  mkldnn  nvidia_cub  onnx-tensorrt  openmp  ps-lite  tvm
ubuntu@ip-172-31-76-214:~/mxnet2/3rdparty$ 

Hi @samskalicky , I can download mshadow by git submodule update --init --recursive

@KellenSunderland Could you please include the TensorRT patches you just mentioned into the cwiki? https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Plan+and+Status

Thanks for your support!

@iblis17 Could you help to pick #15608 #15609 to the v1.5.x branch?

Thank you @KellenSunderland . I just included these patches to the table on cwiki. Please help take a look: https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Plan+and+Status
BTW, do you know someone with TensorRT background can help to pick them to the v1.5.x branch? Because I'm wondering if they need be ported in a certain order.

No problem Tao, I can probably cherry-pick them a bit later today.

@TaoLv I'd like to pull in the following PRs, they are necessary fixes for some of my use-cases:
https://github.com/apache/incubator-mxnet/pull/15245
https://github.com/apache/incubator-mxnet/pull/15917

This PR fixes a memory misalignment bug in topk operator introduced recently. Please add it to 1.5.1 patch release:
https://github.com/apache/incubator-mxnet/pull/15948

Thanks

First three cherry-picked PRs:

15875

15874

15877

@KellenSunderland, the 3 PRs were already merged to v1.5.x. Please update for the other two if we still need them.

Benchmark doc fix PR #15769

@ChaiBapchya, thank you for the report. Could you please cherry pick it to the v1.5.x branch and tag me in the PR?

@TaoLv I'd like to pull in the following PRs, they are necessary fixes for some of my use-cases:

15245

15917

@samskalicky , please pick them to v1.5.x and tag me in the PR. I will update the cwiki page accordingly.

This PR fixes a memory misalignment bug in topk operator introduced recently. Please add it to 1.5.1 patch release:

15948

Thanks

@apeforest Any update? Is it still true that we can pick this to v1.5.x?

@iblis17 Could you help to pick #15608 #15609 to the v1.5.x branch?

@TaoLv Just can I just push to the v1.5.x branch direcetly?

No, please create pull requests instead

Haven't we already passed code freeze?

On Tue, Sep 10, 2019 at 11:16 PM Marco de Abreu notifications@github.com
wrote:

No, please create pull requests instead

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/apache/incubator-mxnet/issues/15613?email_source=notifications&email_token=ABT54SJUQ3XE4O2RKG57EVDQJCEMVA5CNFSM4IFKN5YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6NMNVI#issuecomment-530237141,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABT54SKA7OO3WMTGBRTZXTTQJCEMVANCNFSM4IFKN5YA
.

Yes, we have passed code freeze and the rc0 was already tagged. Front-end packaging is in progress. One can still commit code change to the v1.5.x branch but that will be not included in the vote for rc0.

1.5.0 apparently broke sampling operators on Linux, see #16135. A PR is open for this #16139. I hope this has a chance to make it to 1.5.1.

~No idea why my backporting PR got closed (#16142).~ reopened

Any news on 1.5.1?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

luoruisichuan picture luoruisichuan  Â·  3Comments

dushoufu picture dushoufu  Â·  3Comments

WangcsShuai picture WangcsShuai  Â·  3Comments

Ajoo picture Ajoo  Â·  3Comments

zy-huang picture zy-huang  Â·  3Comments