Incubator-mxnet: MXNET_BACKWARD_DO_MIRROR is broken

Created on 10 Mar 2019  路  10Comments  路  Source: apache/incubator-mxnet

MIRROR feature is broken for a while. Is there any plan to fix this?

Bug Environment Variables

All 10 comments

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug

Contribution is welcome!

@mxnet-label-bot add [Gluon, Bug]

Repo:

(venv) bingxu@bingxu-mbp:~/research/hiconv-tvm/resnet-50$ python mx_memory.py 
3317 MB
(venv) bingxu@bingxu-mbp:~/research/hiconv-tvm/resnet-50$ MXNET_BACKWARD_DO_MIRROR=1 python mx_memory.py 
4496 MB
(venv) bingxu@bingxu-mbp:~/research/hiconv-tvm/resnet-50$ 

Open mirror will cost more memory.

This seems to be incorrectly labelled as Gluon. Removing the gluon label.

@mxnet-label-bot Update [Bug, Environment Variables]

# mxnet version:1.4.0, resnext101
I think the following code for MXNET_BACKWARD_DO_MIRROR has problem which is from .//src/executor/graph_executor.cc.
it will do not use mirror even the MXNET_BACKWARD_DO_MIRROR is on unless the __force_mirroring__ is set.
and...it will use more gpu memory when MXNET_BACKWARD_DO_MIRROR is set.

 260   int do_mirror = dmlc::GetEnv("MXNET_BACKWARD_DO_MIRROR", 0);
 261   auto need_mirror = [do_mirror](const nnvm::Node& node) -> int {
 262     if (node.is_variable()) return 0;
 263     const std::string& type = node.attrs.op->name;
 264     if (type == "Dropout") return false;
 265     if (get_node_attr(node, "__force_mirroring__", false)) return true;
 266     if (do_mirror == 0) return false;
 267     if (type == "Convolution") return false;
 268     if (type == "FullyConnected") return false;
 269     if (type == "Concat") return false;
 270     if (type == "SoftmaxOutput") return false;
 271     if (type == "BatchNorm") return false;
 272     if (type == "CuDNNBatchNorm") return false;
 273     return true;
 274   };

I can't express more about how important this feature is. Please fix it, and get same memory cost to version 0.7. https://github.com/dmlc/mxnet-memonger https://arxiv.org/abs/1604.06174

@antinucleon Given below is the outputs that I got from the LSTM-based NMT model (from the Sockeye toolkit):

Baseline

[INFO:root] Global Step[50] Epoch[0] Batch [50] Speed: 261.10 samples/sec   perplexity=921.880075   Memory Usage (MB): pid_21652=6179,  PE Usage (W, J): dev_0=78.96,1935.41, 
[INFO:root] Global Step[100] Epoch[0] Batch [100]   Speed: 285.67 samples/sec   perplexity=616.976020   Memory Usage (MB): pid_21652=6179,  PE Usage (W, J): dev_0=82.94,3793.57, 
[INFO:root] Global Step[150] Epoch[0] Batch [150]   Speed: 294.01 samples/sec   perplexity=496.731365   Memory Usage (MB): pid_21652=6601,  PE Usage (W, J): dev_0=95.77,5878.26, 
[INFO:root] Global Step[200] Epoch[0] Batch [200]   Speed: 310.77 samples/sec   perplexity=424.378748   Memory Usage (MB): pid_21652=6601,  PE Usage (W, J): dev_0=77.22,7468.53, 
[INFO:root] Global Step[250] Epoch[0] Batch [250]   Speed: 282.17 samples/sec   perplexity=369.385264   Memory Usage (MB): pid_21652=6601,  PE Usage (W, J): dev_0=87.60,9455.43, 
[INFO:root] Global Step[300] Epoch[0] Batch [300]   Speed: 294.64 samples/sec   perplexity=321.364135   Memory Usage (MB): pid_21652=6601,  PE Usage (W, J): dev_0=82.16,11240.07, 

BACKWARD_DO_MIRROR=1

[INFO:root] Global Step[50] Epoch[0] Batch [50] Speed: 151.09 samples/sec   perplexity=949.961463   Memory Usage (MB): pid_20928=2425,  PE Usage (W, J): dev_0=84.69,3587.42, 
[INFO:root] Global Step[100] Epoch[0] Batch [100]   Speed: 170.88 samples/sec   perplexity=625.173421   Memory Usage (MB): pid_20928=2425,  PE Usage (W, J): dev_0=76.74,6461.51, 
[INFO:root] Global Step[150] Epoch[0] Batch [150]   Speed: 178.00 samples/sec   perplexity=499.439886   Memory Usage (MB): pid_20928=2475,  PE Usage (W, J): dev_0=84.37,9494.95, 
[INFO:root] Global Step[200] Epoch[0] Batch [200]   Speed: 195.40 samples/sec   perplexity=426.799941   Memory Usage (MB): pid_20928=2475,  PE Usage (W, J): dev_0=79.16,12087.66, 
[INFO:root] Global Step[250] Epoch[0] Batch [250]   Speed: 169.05 samples/sec   perplexity=371.365061   Memory Usage (MB): pid_20928=2475,  PE Usage (W, J): dev_0=81.68,15179.92, 
[INFO:root] Global Step[300] Epoch[0] Batch [300]   Speed: 180.27 samples/sec   perplexity=323.268620   Memory Usage (MB): pid_20928=2475,  PE Usage (W, J): dev_0=73.94,17805.00, 

We can see from the output logs that with the same number of global steps, both roughly reach the same training quality. The memory footprint of doing backward mirroring is around 1/3 of that in the baseline, but at the same time this comes with around 40% performance drop.

I am currently still investigating on the cause of such huge performance drop. At the same time, if you have any specific benchmark of interest (preferrably small benchmark like the one you commented above because we are still in the debugging phase). Please kindly let me know. Thanks.

My implementation is available here: https://github.com/UofT-EcoSystem/nnvm/blob/cce19c328427de4eacd51178d233ce23a3e3d79d/src/pass/gradient.cc#L144

Any feedback or comment is much appreciated. Thanks.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

GuilongZh picture GuilongZh  路  3Comments

luoruisichuan picture luoruisichuan  路  3Comments

JonBoyleCoding picture JonBoyleCoding  路  3Comments

xzqjack picture xzqjack  路  3Comments

dushoufu picture dushoufu  路  3Comments