Keras: How can I get activation of internal layer of shared model?

Created on 12 Feb 2017 · 4Comments · Source: keras-team/keras

Hello all,

Consider 2-input model that makes use of a shared submodel

def get_sub_net(input_shape):
    img_input = Input(shape=input_shape, name='original input')
    x = Convolution2D(16, 3, 3, activation='relu', name='conv')(img_input )
    out = Flatten()(x)    
    return Model(img_input, out)

input_shape = (3, 32, 28)  # Theano
sub_net= get_sub_net(input_shape)

input_left  = Input(shape=input_shape)
input_right = Input(shape=input_shape)

processed_left  = sub_net(input_left)
processed_right = sub_net(input_right)
x = merge([processed_left, processed_right], mode='concat')
x = Dense(16, activation='relu', name='fc')(x)
x = Dense(1, activation='sigmoid')(x)

model = Model(input=[input_left, input_right], output=x)
model.compile(loss='binary_crossentropy', optimizer='sgd')

this works:

batch_size = 5
X = [np.random.random((batch_size, )+input_shape) for _ in range(2)]

intermediate_layer_model = Model(input=model.input, output=model.get_layer('fc').output)

Now I want to get activation of Convolution layer in the shared sub-net with respect to main model input:

intermediate_layer_model = Model(input=model.input,
                                 output=sub_net.get_layer('conv').output)  

RuntimeError: Graph disconnected: cannot obtain value for tensor original input at layer "original input". The following previous layers were accessed without issue: []

Maybe someone has an answer to related question. How does backpropagation in shared models work? Consider a model shared across 2 different inputs. It has 2 output nodes, and during backward pass receives two gradients. Do these two gradients propagate successively towards input nodes, and weights get updated successively too? - as opposed to 1) averaging 2 gradients _before_ propagating through sub-net; 2) successive propagation 2 gradients with same weights, and averaging weight updates after that

stale

Source

alexander-rakhlin

👍1

Most helpful comment

Hello,

I don't know how to do this, but I usually have the sub_model give more ouputs, this way it respects model encapsulation :

def get_sub_net(input_shape):
    img_input = Input(shape=input_shape, name='original input')
    x = Convolution2D(16, 3, 3, activation='relu', name='conv')(img_input )
    out = Flatten()(x)
    return Model(img_input, [out,x])

input_shape = (3, 32, 28)  # Theano
sub_net= get_sub_net(input_shape)

input_left  = Input(shape=input_shape)
input_right = Input(shape=input_shape)

processed_left,ll  = sub_net(input_left)
processed_right,rr = sub_net(input_right)
x = Merge( mode='concat')([processed_left, processed_right])
x = Dense(16, activation='relu', name='fc')(x)
x = Dense(1, activation='sigmoid')(x)

model = Model(input=[input_left, input_right], output=x)
model.compile(loss='binary_crossentropy', optimizer='sgd')

batch_size = 5
X = [np.random.random((batch_size, )+input_shape) for _ in range(2)]

#intermediate_layer_model = Model(input=model.input, output=model.get_layer('fc').output)

intermediate_layer_model = Model(input=model.input,
                                 output=rr)

Regarding your second question.
Model that have multiple outputs don't compute the Jacobian, they have their loss combined (usually added), so that we have only one loss function (as usual) which gives one real loss value, which is then propagated back through the graph to compute the gradients. The inputs (i.e. shared variables) will receive adjoints calculations from the multiple paths (in case of shared models) which they will happily sum to obtain their gradient, (as we normally do in standard back-propagation algorithm) .
Once the gradient has been calculated for all shared variables, the weights are updated.

unrealwill on 12 Feb 2017

👍3

All 4 comments

Hello,

I don't know how to do this, but I usually have the sub_model give more ouputs, this way it respects model encapsulation :

def get_sub_net(input_shape):
    img_input = Input(shape=input_shape, name='original input')
    x = Convolution2D(16, 3, 3, activation='relu', name='conv')(img_input )
    out = Flatten()(x)
    return Model(img_input, [out,x])

input_shape = (3, 32, 28)  # Theano
sub_net= get_sub_net(input_shape)

input_left  = Input(shape=input_shape)
input_right = Input(shape=input_shape)

processed_left,ll  = sub_net(input_left)
processed_right,rr = sub_net(input_right)
x = Merge( mode='concat')([processed_left, processed_right])
x = Dense(16, activation='relu', name='fc')(x)
x = Dense(1, activation='sigmoid')(x)

model = Model(input=[input_left, input_right], output=x)
model.compile(loss='binary_crossentropy', optimizer='sgd')

batch_size = 5
X = [np.random.random((batch_size, )+input_shape) for _ in range(2)]

#intermediate_layer_model = Model(input=model.input, output=model.get_layer('fc').output)

intermediate_layer_model = Model(input=model.input,
                                 output=rr)

unrealwill on 12 Feb 2017

👍3

Thank you for your answer. Your solution works, nevertheless I need a general approach to implement custom initialization. For arbitrary model, I traverse all its layers and initialize their weights as a function of layer activation, given model input. When sub-model found in graph (as opposed to layer), I need to represent all its trainable layers as outputs. This complicates further if sub-model is shared. So far I couldn't figure out more or less nice solution to that.

alexander-rakhlin on 12 Feb 2017

👍1

It seems you are trying to do some meta-optimization of your model. I don't think that's supported yet, maybe in future versions, or maybe someone else know how to do it. Or you can create a custom keras fork which provide the missing capabilities, but it will be quite low level.

Instead of custom initialization, what I usually do is I add some regularization on weights or activation based on the various features I desire my weights to have. And then I do a standard fit giving a high importance on these regularization losses (this will initially drive the weights as if initialization was custom), and then I can also reduce the importance of regularization over time.

This process of adding regularization because it's quite problem dependent rather than layer dependent, but maybe could be automated through meta-optimization.

unrealwill on 12 Feb 2017

I am implementing LSUV initialization which have proven state-of-the-art. Author implemented it in Keras for single level topology, I am trying to extend it to arbitrary deep encapsulation and weight sharing.

alexander-rakhlin on 12 Feb 2017

Was this page helpful?

0 / 5 - 0 ratings