Incubator-mxnet: is BlockGrad symbol equal to setting propagate_down as false in caffe?

Created on 19 Aug 2016 · 5Comments · Source: apache/incubator-mxnet

In caffe if I set propagate_down as false in some middle layer in my net, I can speed up the back_ward computing time. And this is useful when finetuning I think.
I tried BlockGrad symbol in mxnet like this:
block = mx.sym.BlockGrad(top_layer, name=top_layer.name+"_block")
myfc = mx.sym.FullyConnected(block, num_hidden=args.num_classes, name='myfc')
but I did not see any speedup.
Is there some way to speed up the finetuning like setting propagate_down in caffe?

Source

chenjx1005

Most helpful comment

No, NEVER USE BLOCKGRAD IF YOU HAVE WEIGHT DECAY!!! It only makes the gradient from higher layer to 0, but the weight decay will drive the weight to 0 gradually.

winstywang on 19 Aug 2016

👍2

All 5 comments

It only set the gradient to 0. If you don't want to compute the gradient, don't bind the corresponding gradient array.

piiswrong on 19 Aug 2016

No, NEVER USE BLOCKGRAD IF YOU HAVE WEIGHT DECAY!!! It only makes the gradient from higher layer to 0, but the weight decay will drive the weight to 0 gradually.

winstywang on 19 Aug 2016

👍2

@piiswrong @winstywang thank you for quick replies. I will try not binding the corresponding gradient array.

chenjx1005 on 22 Aug 2016

@piiswrong not binding the gradient arrays seems impossible with APIs in FeedForward model. Can you tell me the way to un-bind the gradient arrays of a FeedForward model? Thank you.

chenjx1005 on 22 Aug 2016

@winstywang So, if I have pretrained model and I want to freeze first 90% of the layers, while on the others i'd like to use gradient with weight decay, how can I do that in mxnet?