Hello all,
Consider 2-input model that makes use of a shared submodel
def get_sub_net(input_shape):
img_input = Input(shape=input_shape, name='original input')
x = Convolution2D(16, 3, 3, activation='relu', name='conv')(img_input )
out = Flatten()(x)
return Model(img_input, out)
input_shape = (3, 32, 28) # Theano
sub_net= get_sub_net(input_shape)
input_left = Input(shape=input_shape)
input_right = Input(shape=input_shape)
processed_left = sub_net(input_left)
processed_right = sub_net(input_right)
x = merge([processed_left, processed_right], mode='concat')
x = Dense(16, activation='relu', name='fc')(x)
x = Dense(1, activation='sigmoid')(x)
model = Model(input=[input_left, input_right], output=x)
model.compile(loss='binary_crossentropy', optimizer='sgd')
this works:
batch_size = 5
X = [np.random.random((batch_size, )+input_shape) for _ in range(2)]
intermediate_layer_model = Model(input=model.input, output=model.get_layer('fc').output)
Now I want to get activation of Convolution layer in the shared sub-net with respect to main model input:
intermediate_layer_model = Model(input=model.input,
output=sub_net.get_layer('conv').output)
RuntimeError: Graph disconnected: cannot obtain value for tensor original input at layer "original input". The following previous layers were accessed without issue: []
Hello,
I don't know how to do this, but I usually have the sub_model give more ouputs, this way it respects model encapsulation :
def get_sub_net(input_shape):
img_input = Input(shape=input_shape, name='original input')
x = Convolution2D(16, 3, 3, activation='relu', name='conv')(img_input )
out = Flatten()(x)
return Model(img_input, [out,x])
input_shape = (3, 32, 28) # Theano
sub_net= get_sub_net(input_shape)
input_left = Input(shape=input_shape)
input_right = Input(shape=input_shape)
processed_left,ll = sub_net(input_left)
processed_right,rr = sub_net(input_right)
x = Merge( mode='concat')([processed_left, processed_right])
x = Dense(16, activation='relu', name='fc')(x)
x = Dense(1, activation='sigmoid')(x)
model = Model(input=[input_left, input_right], output=x)
model.compile(loss='binary_crossentropy', optimizer='sgd')
batch_size = 5
X = [np.random.random((batch_size, )+input_shape) for _ in range(2)]
#intermediate_layer_model = Model(input=model.input, output=model.get_layer('fc').output)
intermediate_layer_model = Model(input=model.input,
output=rr)
Regarding your second question.
Model that have multiple outputs don't compute the Jacobian, they have their loss combined (usually added), so that we have only one loss function (as usual) which gives one real loss value, which is then propagated back through the graph to compute the gradients. The inputs (i.e. shared variables) will receive adjoints calculations from the multiple paths (in case of shared models) which they will happily sum to obtain their gradient, (as we normally do in standard back-propagation algorithm) .
Once the gradient has been calculated for all shared variables, the weights are updated.
Thank you for your answer. Your solution works, nevertheless I need a general approach to implement custom initialization. For arbitrary model, I traverse all its layers and initialize their weights as a function of layer activation, given model input. When sub-model found in graph (as opposed to layer), I need to represent all its trainable layers as outputs. This complicates further if sub-model is shared. So far I couldn't figure out more or less nice solution to that.
It seems you are trying to do some meta-optimization of your model. I don't think that's supported yet, maybe in future versions, or maybe someone else know how to do it. Or you can create a custom keras fork which provide the missing capabilities, but it will be quite low level.
Instead of custom initialization, what I usually do is I add some regularization on weights or activation based on the various features I desire my weights to have. And then I do a standard fit giving a high importance on these regularization losses (this will initially drive the weights as if initialization was custom), and then I can also reduce the importance of regularization over time.
This process of adding regularization because it's quite problem dependent rather than layer dependent, but maybe could be automated through meta-optimization.
I am implementing LSUV initialization which have proven state-of-the-art. Author implemented it in Keras for single level topology, I am trying to extend it to arbitrary deep encapsulation and weight sharing.
Most helpful comment
Hello,
I don't know how to do this, but I usually have the sub_model give more ouputs, this way it respects model encapsulation :
Regarding your second question.
Model that have multiple outputs don't compute the Jacobian, they have their loss combined (usually added), so that we have only one loss function (as usual) which gives one real loss value, which is then propagated back through the graph to compute the gradients. The inputs (i.e. shared variables) will receive adjoints calculations from the multiple paths (in case of shared models) which they will happily sum to obtain their gradient, (as we normally do in standard back-propagation algorithm) .
Once the gradient has been calculated for all shared variables, the weights are updated.