Incubator-mxnet: can anyone give an example of using set_lr_mult

Created on 17 Oct 2016  路  12Comments  路  Source: apache/incubator-mxnet

I can't find how to use set_lr_mult or set_wd_mult on the website. Can anyone give an example?

Most helpful comment

def init_lr_mul(multiplier=1.0):
    tmp_symbol = sym_gen((2, 3))
    # get all parameter list
    internals = tmp_symbol.get_internals()
    arg_names = internals.list_arguments()

    # set lr for all feature layers to 1
    lr_dict = dict()
    for arg_name in arg_names:
        lr_dict[arg_name] = multiplier
    return lr_dict
lr_dict = init_lr_mul(1.0)
# Train a LSTM network as simple as feedforward network
optimizer = mx.optimizer.AdaDelta(clip_gradient=10.0)
optimizer.set_lr_mult(lr_dict)

All 12 comments

Do you mean per-layer learning rate? Just search in issues with your keyword.

Thank you for telling me setting attribute to set learning rate. But I would like to know how to use set_lr_mult since it makes code more concise.

def init_lr_mul(multiplier=1.0):
    tmp_symbol = sym_gen((2, 3))
    # get all parameter list
    internals = tmp_symbol.get_internals()
    arg_names = internals.list_arguments()

    # set lr for all feature layers to 1
    lr_dict = dict()
    for arg_name in arg_names:
        lr_dict[arg_name] = multiplier
    return lr_dict
lr_dict = init_lr_mul(1.0)
# Train a LSTM network as simple as feedforward network
optimizer = mx.optimizer.AdaDelta(clip_gradient=10.0)
optimizer.set_lr_mult(lr_dict)

Thank you magic282, your script is exactly what I need.

hi @magic282, one more question, why multiplier equal to 1 will cause learning rate 0?

@umichyiwan miss spelling. corrected.

@magic282 Btw, I am curious about how you make it work. The training loss is exactly the same no matter I do init_lr_mul(0) or not do this step.
I guess its because _get_lr() doesn't take layer name as input, instead it takes index. So the learning rate does not change.

@umichyiwan I met the same problem too, it's weird, why the training loss didnot change after I set different lr_mult() ?

@umichyiwan @happygds
You need to give param_idx2name to optimizer.

lr_dict = dict()
param_idx2name = dict()
for idx, arg_name in enumerate(arg_names):
    lr_dict[arg_name] = 1.0
    param_idx2name[idx] = arg_name

optimizer = mx.optimizer.Adam(args.lr, sym = sym, param_idx2name=param_idx2name, rescale_grad = 1.0 / args.batch_size)
optimizer.set_lr_mult(lr_dict)

Then, _get_lr() takes index by name accurately as @umichyiwan mentioned.

This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!

Was this page helpful?
0 / 5 - 0 ratings