Incubator-mxnet: is BlockGrad symbol equal to setting propagate_down as false in caffe?

Created on 19 Aug 2016  路  5Comments  路  Source: apache/incubator-mxnet

In caffe if I set propagate_down as false in some middle layer in my net, I can speed up the back_ward computing time. And this is useful when finetuning I think.
I tried BlockGrad symbol in mxnet like this:
block = mx.sym.BlockGrad(top_layer, name=top_layer.name+"_block")
myfc = mx.sym.FullyConnected(block, num_hidden=args.num_classes, name='myfc')
but I did not see any speedup.
Is there some way to speed up the finetuning like setting propagate_down in caffe?

Most helpful comment

No, NEVER USE BLOCKGRAD IF YOU HAVE WEIGHT DECAY!!! It only makes the gradient from higher layer to 0, but the weight decay will drive the weight to 0 gradually.

All 5 comments

It only set the gradient to 0. If you don't want to compute the gradient, don't bind the corresponding gradient array.

No, NEVER USE BLOCKGRAD IF YOU HAVE WEIGHT DECAY!!! It only makes the gradient from higher layer to 0, but the weight decay will drive the weight to 0 gradually.

@piiswrong @winstywang thank you for quick replies. I will try not binding the corresponding gradient array.

@piiswrong not binding the gradient arrays seems impossible with APIs in FeedForward model. Can you tell me the way to un-bind the gradient arrays of a FeedForward model? Thank you.

@winstywang So, if I have pretrained model and I want to freeze first 90% of the layers, while on the others i'd like to use gradient with weight decay, how can I do that in mxnet?

Was this page helpful?
0 / 5 - 0 ratings