From the discussion with @antinucleon, we want to train a pretrained model, for example, VGG, where we want to limit the first two layers' learning rate to 0. @antinucleon showed me an example in RCCN https://github.com/dmlc/mxnet/blob/master/example/rcnn/rcnn/solver.py#L41 I am wondering if we have an easier way to control it from the interface, e.g. an optimizer with a function to control the learning rate per layer? Thanks.
in the symbol, pass in attr={'lr_mult': '0.01'}
to set learning rate for this layer
Thanks. Is it like this for setting conv1_1
and conv1_2
(the first layer of VGG) learning rate to 0?
conv1_1 = mx.symbol.Convolution(name='conv1_1', data=data , num_filter=64, pad=(1,1), kernel=(3,3), stride=(1,1), no_bias=False, workspace=1024,attr={'lr_mult': '0.00'})
relu1_1 = mx.symbol.Activation(name='relu1_1', data=conv1_1 , act_type='relu')
conv1_2 = mx.symbol.Convolution(name='conv1_2', data=relu1_1 , num_filter=64, pad=(1,1), kernel=(3,3), stride=(1,1), no_bias=False, workspace=1024, attr={'lr_mult': '0.00'})
relu1_2 = mx.symbol.Activation(name='relu1_2', data=conv1_2 , act_type='relu')
pool1 = mx.symbol.Pooling(name='pool1', data=relu1_2 , pad=(0,0), kernel=(2,2), stride=(2,2), pool_type='avg')
@phunterlau I think it will work
Most helpful comment
Thanks. Is it like this for setting
conv1_1
andconv1_2
(the first layer of VGG) learning rate to 0?