Incubator-mxnet: can anyone give an example of using set_lr_mult

Created on 17 Oct 2016 · 12Comments · Source: apache/incubator-mxnet

I can't find how to use set_lr_mult or set_wd_mult on the website. Can anyone give an example?

Source

yiwan-rl

Most helpful comment

def init_lr_mul(multiplier=1.0):
    tmp_symbol = sym_gen((2, 3))
    # get all parameter list
    internals = tmp_symbol.get_internals()
    arg_names = internals.list_arguments()

    # set lr for all feature layers to 1
    lr_dict = dict()
    for arg_name in arg_names:
        lr_dict[arg_name] = multiplier
    return lr_dict

lr_dict = init_lr_mul(1.0)
# Train a LSTM network as simple as feedforward network
optimizer = mx.optimizer.AdaDelta(clip_gradient=10.0)
optimizer.set_lr_mult(lr_dict)

magic282 on 17 Oct 2016

👍2

All 12 comments

https://github.com/shuokay/bot_competition/blob/master/train.py#L9

shuokay on 17 Oct 2016

Do you mean per-layer learning rate? Just search in issues with your keyword.

zihaolucky on 17 Oct 2016

Thank you for telling me setting attribute to set learning rate. But I would like to know how to use set_lr_mult since it makes code more concise.

yiwan-rl on 17 Oct 2016

def init_lr_mul(multiplier=1.0):
    tmp_symbol = sym_gen((2, 3))
    # get all parameter list
    internals = tmp_symbol.get_internals()
    arg_names = internals.list_arguments()

    # set lr for all feature layers to 1
    lr_dict = dict()
    for arg_name in arg_names:
        lr_dict[arg_name] = multiplier
    return lr_dict

lr_dict = init_lr_mul(1.0)
# Train a LSTM network as simple as feedforward network
optimizer = mx.optimizer.AdaDelta(clip_gradient=10.0)
optimizer.set_lr_mult(lr_dict)

magic282 on 17 Oct 2016

👍2

Thank you magic282, your script is exactly what I need.

yiwan-rl on 17 Oct 2016

hi @magic282, one more question, why multiplier equal to 1 will cause learning rate 0?

yiwan-rl on 18 Oct 2016

@umichyiwan miss spelling. corrected.

magic282 on 18 Oct 2016

@magic282 Btw, I am curious about how you make it work. The training loss is exactly the same no matter I do init_lr_mul(0) or not do this step.
I guess its because _get_lr() doesn't take layer name as input, instead it takes index. So the learning rate does not change.

yiwan-rl on 18 Oct 2016

@umichyiwan I met the same problem too, it's weird, why the training loss didnot change after I set different lr_mult() ?

happygds on 15 Mar 2017

@umichyiwan @happygds
You need to give param_idx2name to optimizer.

lr_dict = dict()
param_idx2name = dict()
for idx, arg_name in enumerate(arg_names):
    lr_dict[arg_name] = 1.0
    param_idx2name[idx] = arg_name

optimizer = mx.optimizer.Adam(args.lr, sym = sym, param_idx2name=param_idx2name, rescale_grad = 1.0 / args.batch_size)
optimizer.set_lr_mult(lr_dict)

Then, _get_lr() takes index by name accurately as @umichyiwan mentioned.