hello,
I have two datasets different with differents kind of label different. So I have two different networks but I want to shared some layers (the first convolutional) between them during the training.
Exemple :
Input 1 dataset 1 -> | VGG16 convolutional layers shared | -> | VGG16 layers 1 unshared | -> output 1
Input 2 dataset 2 -> | VGG16 convolutional layers shared | -> |聽VGG16 layers 2 unshared | -> output 2
For the training step I was looking for trained the layers shared batch by batch alternating the dataset 1 and the dataset 2.
something like that :
# one iteration
net1.forward(batch1.next())
net1.backward()
net1.update
net2.forward(batch2.next())
net2.backward()
net2.update()
How can I shared the weight with two networks? (and not only one network because I have two different datasets)
@Shiro-mx you can refer to #6909
@Ldpe2G thank you for your help
I am maybe wrong but it seems that there is only one output, instead of the number of input. The function feedforward can do this with multiple output? Because my goal was to train the shared weight of the different networks but also to train the unshared weight ( the 2 or 3 last layers)
@Shiro-mx I suggest you to use the module api which is more flexible. The triplet loss example shows how to share weight between nets by define the weight variable with specific name, and use them in both nets.
I create a small network but I have a problem regarding binding. I also want to use the sgd optimizer but I don't know where to put it. I read some topic but I do not really understand what I have to do.
def get_shared_network(data, fc1_weight, fc1_bias, fc2_weight, fc2_bias):
fc1 = mx.symbol.FullyConnected(data=data, name='fc1', num_hidden=128, weight=fc1_weight, bias=fc1_bias)
act1 = mx.symbol.Activation(data=fc1, name='relu1', act_type="relu")
fc2 = mx.symbol.FullyConnected(data=act1, name='fc2', num_hidden=64, weight=fc2_weight, bias=fc2_bias)
act2 = mx.symbol.Activation(data=fc2, name='relu2', act_type="relu")
return act2
def get_two_network():
data1 = mx.sym.Variable('data1')
data2 = mx.symbol.Variable('data2')
fc1_w = mx.sym.Variable('fc1_w', init=mx.init.Constant(0.02))
fc2_w = mx.symbol.Variable('fc2_w', init=mx.init.Constant(0.01))
fc1_b = mx.sym.Variable('fc1_b', init=mx.init.Constant(0.08))
fc2_b = mx.symbol.Variable('fc2_b', init=mx.init.Constant(0.09))
act2_1 = get_shared_network(data1, fc1_w, fc1_b, fc2_w, fc2_b)
act2_2 = get_shared_network(data2, fc1_w, fc1_b, fc2_w, fc2_b)
fc3_1 = mx.symbol.FullyConnected(data=act2_1, name='fc3_1', num_hidden=5)
fc3_2 = mx.symbol.FullyConnected(data=act2_2, name='fc3_2', num_hidden=5)
softmax1 = mx.sym.SoftmaxOutput(data=fc3_1, name='softmax1')
softmax2 = mx.sym.SoftmaxOutput(data=fc3_2, name='softmax2')
return [softmax1, softmax2]
# Data
imgrec_train1 = 'mnistasjpg/train_data1' + '.rec'
imglist_train1 = 'mnistasjpg/train_data1' + '.lst'
imgrec_test1 = 'mnistasjpg/test_data1' + '.rec'
imglist_test1 = 'mnistasjpg/test_data1' + '.lst'
imgrec_train2 = 'mnistasjpg/train_data2' + '.rec'
imglist_train2 = 'mnistasjpg/train_data2' + '.lst'
imgrec_test2 = 'mnistasjpg/test_data2' + '.rec'
imglist_test2 = 'mnistasjpg/test_data2' + '.lst'
# # load dataset
train_dataiter1 = mx.io.ImageRecordIter(
path_imgrec=imgrec_train1,
data_shape=(3, x, y),
batch_size=batch_size,
path_imglist=imglist_train1,
preprocess_threads=1,
label_width=1,
data_name='data1',
label_name='softmax1_label',
# shuffle=True,
# shuffle_chunk_seed=100,
# seed=100,
# rand_mirror=True,
# rand_mirror_prob=0.5,
# random_crop=True
)
print('testing Dataset ...')
# Data validation
test_dataiter1 = mx.io.ImageRecordIter(
path_imgrec=imgrec_test1,
data_shape=(3, x, y),
batch_size=batch_size,
path_imglist=imglist_test1,
preprocess_threads=1,
label_width=1,
data_name='data1',
label_name='softmax1_label'
)
train_dataiter2 = mx.io.ImageRecordIter(
path_imgrec=imgrec_train2,
data_shape=(3, x, y),
batch_size=batch_size,
path_imglist=imglist_train2,
preprocess_threads=1,
label_width=1,
data_name='data2',
label_name='softmax2_label',
#shuffle=True,
# shuffle_chunk_seed=100,
# seed=100,
#rand_mirror=True,
#rand_mirror_prob=0.5,
#random_crop=True
)
print('testing Dataset ...')
# Data validation
test_dataiter2 = mx.io.ImageRecordIter(
path_imgrec=imgrec_test2,
data_shape=(3, x, y),
batch_size=batch_size,
path_imglist=imglist_test2,
preprocess_threads=1,
label_width=1,
data_name='data2',
label_name='softmax2_label'
)
sgd = mx.optimizer.create('sgd', learning_rate=.001, momentum=0.9, wd=0.0005)
name_model = get_two_network()
# load network
mod1 = mx.mod.Module(name_model[0], context=mx.cpu(), data_names=['data1'], label_names=[output_name])
mod1.bind(data_shapes=[('data1', (1, 3, x, y))], label_shapes=[(output_name, (1,))] )
mod1.init_params()
mod2 = mx.mod.Module(name_model[1], context=mx.cpu(), data_names=['data2'], label_names=[output_name2])
mod2.bind(data_shapes=[('data2', (1, 3, x, y))], label_shapes=[(output_name2, (1,))] )
mod2.init_params()
# train
batch = train_dataiter1.next()
mod1.forward(batch, is_train= True)
result = mod1.get_outputs()[0]
print(result.asnumpy())
#mod1.backward(sgd)
mod1.update()
The error :
Traceback (most recent call last):
File "mxnet_shared.py", line 164, in <module>
mod1.update()
File "/usr/local/lib/python3.5/dist-packages/mxnet-0.10.1-py3.5.egg/mxnet/module/module.py", line 610, in update
assert self.binded and self.params_initialized and self.optimizer_initialized
AssertionError
EDIT : I maybe solve the problem adding this :
mod1.init_optimizer(kvstore='local', optimizer='sgd', optimizer_params=(('learning_rate', 0.001),('momentum', 0.9),('wd', 0.0005) ))
It solved the binding error, but it seems that the weight are not shared.
print('AFTER TRAIN')
print(mod1.get_params()[0]['fc2_b'].asnumpy())
print(mod2.get_params()[0]['fc2_b'].asnumpy())
it gives me :
BEFORE TRAIN
[ 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093]
[ 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093]
[[ 9.96710181e-01 4.97425392e-15 6.05048997e-11 3.28975217e-03
9.74552383e-12]]
[[ 1. 0. 0. 0. 0.]]
[[ 1. 0. 0. 0. 0.]]
AFTER TRAIN
[ 0.00929993 0.00930007 0.00930124 0.00930121 0.0092996 0.00930087
0.00930096 0.00930125 0.00929955 0.0093001 0.00930067 0.00929943
0.00929947 0.00930002 0.00929987 0.00929933 0.00930128 0.00929914
0.00929966 0.00930015 0.00930036 0.00929985 0.00929992 0.00929972
0.00930061 0.00929911 0.00929995 0.00930028 0.00929933 0.0092998
0.00929865 0.00929941 0.00929879 0.00930045 0.00930095 0.00930034
0.00929994 0.00929952 0.00930084 0.00930035 0.00930017 0.0093004
0.00930039 0.00930087 0.00929925 0.00929921 0.00930016 0.00930016
0.00929892 0.00930011 0.00930025 0.00930009 0.00929855 0.00930055
0.00930054 0.00930065 0.00930017 0.00929909 0.00929966 0.0093
0.00930044 0.00929949 0.00930084 0.00930004]
[ 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093 0.0093
0.0093]
why the shared weight did not work ?
remove the asnumpy then you might get its object id, which can help you debug. It should be consistent if your code is correct.
print('AFTER TRAIN')
print(mod1.get_params()[0]['fc2_b'])
print(mod2.get_params()[0]['fc2_b'])
@zihaolucky Thank your for the help.
How can I obtain the id ? I get that after removing asnumpy:
AFTER TRAIN
<NDArray 64 @cpu(0)>
<NDArray 64 @cpu(0)>
Oops, my bad. Could you try compare two objects with Python operator is ?
BTW, did you just use two different mod? I suppose theses models are different, although they have same name for the weight.
@zihaolucky
It doesn't seem the same. Yes, I use two mod one for each input and output.
I think calling two mod and then bind it input/output separately even if it is the "same" symbol create two different network. Is there a way correct this ? (maybe I am doing something wrong).
If it's not, is it possible to train only one input and one output a network which have two intput/output ? I did not find a topic about it.
I am having the same issue - I would really like to be able to share parameters across two modules, because each module will be reading from a distinct data stream and training a distinct network with only some shared components. A somewhat minimal example appears below:
import logging
import mxnet as mx
mnist = mx.test_utils.get_mnist()
batch_size = 100
train_iter = mx.io.NDArrayIter(mnist['train_data'], mnist['train_label'], batch_size, shuffle=True)
val_iter = mx.io.NDArrayIter(mnist['test_data'], mnist['test_label'], batch_size)
data = mx.sym.var('data')
data = mx.sym.flatten(data=data)
# Set up variables to share parameters for all layers
w1 = mx.sym.Variable('1_weight')
w2 = mx.sym.Variable('2_weight')
w3 = mx.sym.Variable('3_weight')
b1 = mx.sym.Variable('1_bias')
b2 = mx.sym.Variable('2_bias')
b3 = mx.sym.Variable('3_bias')
# Build network 1 with explicit weight pointers
fc1 = mx.sym.FullyConnected(data=data, num_hidden=128, weight=w1, bias=b1)
act1 = mx.sym.Activation(data=fc1, act_type="relu")
fc2 = mx.sym.FullyConnected(data=act1, num_hidden=64, weight=w2, bias=b2)
act2 = mx.sym.Activation(data=fc2, act_type="relu")
fc3 = mx.sym.FullyConnected(data=act2, num_hidden=10, weight=w3, bias=b3)
mlp = mx.sym.SoftmaxOutput(data=fc3, name='softmax')
# Build module 1
mlp_model = mx.mod.Module(symbol=mlp, context=mx.cpu())
# Now build a second, identical graph that shares the same weight pointers
fc1s = mx.sym.FullyConnected(data=data, num_hidden=128, weight=w1, bias=b1)
act1s = mx.sym.Activation(data=fc1s, act_type="relu")
fc2s = mx.sym.FullyConnected(data=act1s, num_hidden=64, weight=w2, bias=b2)
act2s = mx.sym.Activation(data=fc2s, act_type="relu")
fc3s = mx.sym.FullyConnected(data=act2s, num_hidden=10, weight=w3, bias=b3)
mlps = mx.sym.SoftmaxOutput(data=fc3s, name='softmax')
# Build module 2
mlp_models = mx.mod.Module(symbol=mlps, context=mx.cpu())
# Train module 1
logging.getLogger().setLevel(logging.DEBUG) # logging to stdout
print("\n===Training module1===\n")
mlp_model.fit(train_iter, # train data
eval_data=val_iter, # validation data
optimizer='sgd', # use SGD to train
optimizer_params={'learning_rate': 0.1}, # use fixed learning rate
eval_metric='acc', # report accuracy during training
batch_end_callback=mx.callback.Speedometer(batch_size, 100),
num_epoch=5) # train for at most 10 dataset passes
# Train module 2
# We expect the shared module to start where the first module finished
print("\n===Training module2===\n")
mlp_models.shared_group = mlp_model._exec_group
mlp_models.fit(train_iter, # train data
eval_data=val_iter, # validation data
optimizer='sgd', # use SGD to train
optimizer_params={'learning_rate': 0.1}, # use fixed learning rate
eval_metric='acc', # report accuracy during training
batch_end_callback=mx.callback.Speedometer(batch_size, 100),
num_epoch=5) # train for at most 10 dataset passes
# Making sure that fit doesn't always overwrite parameters by returning to module 1
print("\n===Training module1===\n")
mlp_model.fit(train_iter, # train data
eval_data=val_iter, # validation data
optimizer='sgd', # use SGD to train
optimizer_params={'learning_rate': 0.1}, # use fixed learning rate
eval_metric='acc', # report accuracy during training
batch_end_callback=mx.callback.Speedometer(batch_size, 100),
num_epoch=5) # train for at most 10 dataset passes
@cherryc Great, maybe you can write a tutorial or example to help more people with this feature.
No, sorry - I wasn't clear - the above code does not work as intended! Where I say, "We expect the shared module to start where the first module finished," instead the shared module starts over from random initialization, indicating that the two modules do not share parameters. I haven't yet been able to find a way to make them.
Does it work if we recreate module from symbol before each fit ?
mlp_model = mx.mod.Module(symbol=mlp, context=mx.cpu())
mlp_model.fit(...)
mlp_models = mx.mod.Module(symbol=mlps, context=mx.cpu())
mlp_models.fit(...)
mlp_model = mx.mod.Module(symbol=mlp, context=mx.cpu())
mlp_model.fit(...)
@piiswrong how can we realize this feature? What if we name a param with the same name in a different module? Does it use the same NDArray?
I test with get_params() to transfert weight between model and it seems to work.
But I'm not sure if both network have different layer after shared layer, the weight transfers will work correctly (fit function will overwrite other layer with their default initialization).
This method don't use the shared weight functionality of MXNet.
EDIT1 : if both network have layer independent after shared layer, it seems to work too.
# Train module 1
mlp_model = mx.mod.Module(symbol=mlp, context=mx.cpu())
logging.getLogger().setLevel(logging.DEBUG) # logging to stdout
print("\n===Training module1===\n")
mlp_model.fit(train_iter, # train data
eval_data=val_iter, # validation data
optimizer='sgd', # use SGD to train
optimizer_params={'learning_rate': 0.1}, # use fixed learning rate
eval_metric='acc', # report accuracy during training
batch_end_callback=mx.callback.Speedometer(batch_size, 100),
num_epoch=1) # train for at most 10 dataset passes
# Train module 2
# We expect the shared module to start where the first module finished
print("\n===Training module2===\n")
arg_param, aux_param = mlp_model.get_params()
mlp_models = mx.mod.Module(symbol=mlps, context=mx.cpu())
#mlp_models.shared_group = mlp_model._exec_group
mlp_models.fit(train_iter, # train data
eval_data=val_iter, # validation data
optimizer='sgd', # use SGD to train
optimizer_params={'learning_rate': 0.1}, # use fixed learning rate
eval_metric='acc', # report accuracy during training
batch_end_callback=mx.callback.Speedometer(batch_size, 100),
num_epoch=1,# train for at most 10 dataset passes
arg_params=arg_param)
arg_param, aux_param = mlp_models.get_params()
# Making sure that fit doesn't always overwrite parameters by returning to module 1
print("\n===Training module1===\n")
mlp_model.fit(train_iter, # train data
eval_data=val_iter, # validation data
optimizer='sgd', # use SGD to train
optimizer_params={'learning_rate': 0.1}, # use fixed learning rate
eval_metric='acc', # report accuracy during training
batch_end_callback=mx.callback.Speedometer(batch_size, 100),
num_epoch=1,
arg_params=arg_param) # train for at most 10 dataset passes
For param sharing I think you just need to use the shared_module option during mod.bind when you have 2 modules sharing the some parameters
One detail is that you need a "master_module" which has all symbols that you need, and pass it as the shared_module to the mod1.bind() and mod2.bind(), where mod1 and mod2 has a subset of the param/symbols.
This issue is closed due to lack of activity in the last 90 days. Feel free to ping me to reopen if this is still an active issue. Thanks!
Also, do please check out our forum (and Chinese version) for general "how-to" questions.
Most helpful comment
I am having the same issue - I would really like to be able to share parameters across two modules, because each module will be reading from a distinct data stream and training a distinct network with only some shared components. A somewhat minimal example appears below: