Incubator-mxnet: Per-layer learning rate for fine tuning a pretrained network

Created on 26 May 2016  路  3Comments  路  Source: apache/incubator-mxnet

From the discussion with @antinucleon, we want to train a pretrained model, for example, VGG, where we want to limit the first two layers' learning rate to 0. @antinucleon showed me an example in RCCN https://github.com/dmlc/mxnet/blob/master/example/rcnn/rcnn/solver.py#L41 I am wondering if we have an easier way to control it from the interface, e.g. an optimizer with a function to control the learning rate per layer? Thanks.

Most helpful comment

Thanks. Is it like this for setting conv1_1 and conv1_2 (the first layer of VGG) learning rate to 0?

    conv1_1 = mx.symbol.Convolution(name='conv1_1', data=data , num_filter=64, pad=(1,1), kernel=(3,3), stride=(1,1), no_bias=False, workspace=1024,attr={'lr_mult': '0.00'})
    relu1_1 = mx.symbol.Activation(name='relu1_1', data=conv1_1 , act_type='relu')
    conv1_2 = mx.symbol.Convolution(name='conv1_2', data=relu1_1 , num_filter=64, pad=(1,1), kernel=(3,3), stride=(1,1), no_bias=False, workspace=1024, attr={'lr_mult': '0.00'})
    relu1_2 = mx.symbol.Activation(name='relu1_2', data=conv1_2 , act_type='relu')
    pool1 = mx.symbol.Pooling(name='pool1', data=relu1_2 , pad=(0,0), kernel=(2,2), stride=(2,2), pool_type='avg')

All 3 comments

in the symbol, pass in attr={'lr_mult': '0.01'} to set learning rate for this layer

Thanks. Is it like this for setting conv1_1 and conv1_2 (the first layer of VGG) learning rate to 0?

    conv1_1 = mx.symbol.Convolution(name='conv1_1', data=data , num_filter=64, pad=(1,1), kernel=(3,3), stride=(1,1), no_bias=False, workspace=1024,attr={'lr_mult': '0.00'})
    relu1_1 = mx.symbol.Activation(name='relu1_1', data=conv1_1 , act_type='relu')
    conv1_2 = mx.symbol.Convolution(name='conv1_2', data=relu1_1 , num_filter=64, pad=(1,1), kernel=(3,3), stride=(1,1), no_bias=False, workspace=1024, attr={'lr_mult': '0.00'})
    relu1_2 = mx.symbol.Activation(name='relu1_2', data=conv1_2 , act_type='relu')
    pool1 = mx.symbol.Pooling(name='pool1', data=relu1_2 , pad=(0,0), kernel=(2,2), stride=(2,2), pool_type='avg')

@phunterlau I think it will work

Was this page helpful?
0 / 5 - 0 ratings