Incubator-mxnet: How do I make a siamese network with pretrained models (esp. keeping the weights the same?)

Created on 8 Nov 2017  路  10Comments  路  Source: apache/incubator-mxnet

Description

How do I ensure the weights are kept the same? Can I unpack the internal layers somehow and set the weights of each to the same variable? My apologies, I'm new to MXNet. Would really appreciate the help, thanks!

sym1, arg_params1, aux_params1 = mx.model.load_checkpoint('resnet-152', 0)
sym2, arg_params2, aux_params2 = mx.model.load_checkpoint('resnet-152', 0)
layer1 = sym1.get_internals()
layer2 = sym2.get_internals()
for i in range(len(layer1)): # will something like this work?
    arg_params1[i] = arg_params2[i]

Relevant answers, but not specific enough to my particular problem:
https://github.com/apache/incubator-mxnet/issues/772 siamese networks
https://github.com/apache/incubator-mxnet/issues/6791 extract layers as variables
https://github.com/apache/incubator-mxnet/issues/557 set weights to be same

needs triage

Most helpful comment

I got a similar problem and achieve to got several solutions: https://github.com/apache/incubator-mxnet/issues/7530
With the Gluon API it's easy and straight forward, with module API that something else :(
I put my test here: https://github.com/edmBernard/mxnet_example_shared_weight
Readme describe if it's work or not

All 10 comments

I got a similar problem and achieve to got several solutions: https://github.com/apache/incubator-mxnet/issues/7530
With the Gluon API it's easy and straight forward, with module API that something else :(
I put my test here: https://github.com/edmBernard/mxnet_example_shared_weight
Readme describe if it's work or not

Wow, I didn't know that API existed. I had a lot of trouble trying to make it work with the module API but the Gluon API looks super promising, thanks for sharing :)

However, though I'll definitely test Gluon out, do you know how I would do this with the Module API?

Can I extract the each layer's functionality somehow and set the weights to the same variable as its identical layer in the other network? If it's too big a hassle, I guess I would use Gluon, though all the other code I have uses the Module API.

If you have exactly the same network two time, it might be possible to use shared_module in bind function. it's use in RNN to duplicate network. I was not able to use it, as my two networks were not exactly the same. here
In my opinion, It will be easier to switch to Gluon and you will be sure it will work.
More you can use in Gluon your network define with symbol API. here (I don't have test it)

Hey again,

I tried something like this, but I still have a lot of questions:

    sym1, arg_params, aux_params = get_model()
    sym2, arg_params, aux_params = get_model()

    mod1 = mx.mod.Module(symbol=sym1, context=mx.cpu(), label_names=None)
    mod2 = mx.mod.Module(symbol=sym2, context=mx.cpu(), label_names=None)
    mod1.bind(for_training=True, shared_module=mod2, data_shapes=[('data', (1,3,224,224))], # true to train
             label_shapes=mod1._label_shapes)
    mod2.bind(for_training=True, shared_module=mod1, data_shapes=[('data', (1,3,224,224))], # true to train
             label_shapes=mod2._label_shapes)
    mod1.set_params(arg_params, aux_params, allow_missing=True)
    mod2.set_params(arg_params, aux_params, allow_missing=True)

    out1 = sym1.get_internals()['flatten0_output']
    out2 = sym2.get_internals()['flatten0_output']
    siamese_out = mx.sym.Concat(out1, out2, dim=0)

    # Example stacked network after it
    fc1  = mx.symbol.FullyConnected(data = siamese_out, name='fc1', num_hidden=128)
    act1 = mx.symbol.Activation(data = fc1, name='relu1', act_type="relu")
    fc2  = mx.symbol.FullyConnected(data = act1, name = 'fc2', num_hidden = 64)
    act2 = mx.symbol.Activation(data = fc2, name='relu2', act_type="relu")
    fc3  = mx.symbol.FullyConnected(data = act2, name='fc3', num_hidden=num_classes)
    mlp  = mx.symbol.SoftmaxOutput(data = fc3, name = 'softmax')
    # new_args = dict()

    mod3 = mx.mod.Module(fc1, context=mx.cpu(), label_names=None)
    mod3 = fe_mod.bind(for_training=False, data_shapes=[('data', (1,3,224,224))])
    mod3.set_params(arg_params, aux_params)

I only want the first part of this network (layers attached to mod2 & mod1) to be shared. Would something like this work & still backpropagate errors appropriately when fitted?

Having to run mod.fit on each part of the network could be inconvenient. Is there a way around this?

I don't test shared_module in something similar to you application. (Are you sure you don't want to use Gluon ?) :)

I don't test your code but some corrections :

# you don't need `shared_module=mod2`
mod1.bind(for_training=True, shared_module=mod2, data_shapes=[('data', (1,3,224,224))], label_shapes=mod1._label_shapes)

If you want to train everythings as one network, you need to define a new Data Iterator that is able to pass two different image in you network.

Maybe it's easier to try this example of triplet loss network (I don't test if it work)

here an example using Gluon

Wow. Thank you so much. Alright, this gives me a lot to think about.
I'm really grateful for your help, thanks a ton.

If you want to share weights across the network, why not just use one copy of the network and run it twice with the inputs?

final_net(nd.concat(shared_net(x), shared_net(x)))

Also, I definitely recommend using Gluon instead of pure MxNet

@apache/mxnet-committers: This issue has been inactive for the past 90 days. It has no label and needs triage.

For general "how-to" questions, our user forum (and Chinese version) is a good place to get help.

@tz-hmc, Hope your question has been answered.
For general "how-to" questions, our user forum (and Chinese version) is a good place to get help.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dushoufu picture dushoufu  路  3Comments

WangcsShuai picture WangcsShuai  路  3Comments

JonBoyleCoding picture JonBoyleCoding  路  3Comments

GuilongZh picture GuilongZh  路  3Comments

realbns2008 picture realbns2008  路  3Comments