I'm trying to implement a Convolutional - LSTM.
It's a recurrent layer which accepts an image as input and uses a convolution to calculate the various gates in the LSTM.
So I'm trying to subclass Recurrentand change the input dimension.
In order to do that I read the documentation on writing a custom layer and followed the suggestion to read source code to understand what's happening under the hood.
I read the code for recurrent.py and think that the structure is clear: You inherit from Recurrent but you don't overwrite call, instead you provide a custom stepfunction and Recurrentwill take care of applying the step to each entry in a sequence.
As a starting point I took the code for the GRU and tried to adapt it to my needs.
I want to combine a 2D convolution and a GRU (usually it's an LSTM, but that doesn't really matter - I decided to implement a C-GRU)
The idea is to have a usual 2D convolution in the model which outputs 3 features. Those 3 features will be used as the r,z and h activations in the GRU. In the custom layer I only have to keep track of the state. My layer doesn't even have trainable weights, they are contained in the convolution.
Notable changes to the original GRUcode are:
def step(self, x, states):
# the previous state is a 2D vector
h_tm1 = states[0] # previous memory
z=self.inner_activation(x[:,0,:,:])
r=self.inner_activation(x[:,1,:,:])
hh=self.activation(x[:,2,:,:])
h = z * h_tm1 + (1 - z) * hh
return h, [h]
As you can see, I'm simply reusing the features from the convolution. The multiplications should be performed element-wise. I'll debug this to make sure it has the intended behaviour.
Since the state becomes 2D, I'm changing the initial_state, too:
def get_initial_states(self, x):
initial_state=K.zeros_like(x) # (samples, timesteps, input_dim)
# input_dim = (3, x_dim, y_dim)
initial_state=K.sum(initial_state, axis=(1,2)) # (samples, x_dim, y_dim)
return initial_state
The output_shape seems to be hardcoded for Recurrent networks. I'm overriding it:
def get_output_shape_for(self, input_shape):
#TODO: this is hardcoding for th layout
return (input_shape[0],1,input_shape[2],input_shape[3])
Another thing that's hardcoded is the input_spec.
In the constructor, after the call to super, I'm overriding it with my input dimension:
class CGRU(Recurrent):
def __init__(self,
init='glorot_uniform', inner_init='orthogonal',
activation='tanh', inner_activation='hard_sigmoid', **kwargs):
self.init = initializations.get(init)
self.inner_init = initializations.get(inner_init)
self.activation = activations.get(activation)
self.inner_activation = activations.get(inner_activation)
#removing the regularizers and the dropout
super(CGRU, self).__init__(**kwargs)
# this seems necessary in order to accept 5 input dimensions
# (samples, timesteps, features, x, y)
self.input_spec=[InputSpec(ndim=5)]
There are other small changes.
You can find the whole code here: http://pastebin.com/60ztPis3
When ran, this produces the following error message:
theano.tensor.var.AsTensorError: ('Cannot convert [None] to TensorType',
)
The whole error message on pastebin: http://pastebin.com/Cdmr20Yn
I'm trying to debug the code. But that's rather hard, it goes deep into the Keras source code.
One thing: The execution never reaches my custom stepfunction. So apparently something in the configuration is going wrong. In the callfunction of Recurrent, input_shape is a tuple with the entries (None, 40,1,40,40)
This is correct. My sequence has 40 elements. Each one is an image with 1 feature and 40x40 resolution. I'm using the "th" layout.
Here is the call function of Recurrent.
My code reaches the call to K.rnn, the setup looks fine to me. Input_spec seems correct.
But during K.rnn it crashes. Without reaching my step function.
def call(self, x, mask=None):
# input shape: (nb_samples, time (padded with zeros), input_dim)
# note that the .build() method of subclasses MUST define
# self.input_spec with a complete input shape.
input_shape = self.input_spec[0].shape
if self.stateful:
initial_states = self.states
else:
initial_states = self.get_initial_states(x)
constants = self.get_constants(x)
preprocessed_input = self.preprocess_input(x)
last_output, outputs, states = K.rnn(self.step, preprocessed_input,
initial_states,
go_backwards=self.go_backwards,
mask=mask,
constants=constants,
unroll=self.unroll,
input_length=input_shape[1])
At this point I'm lost.
Could you help me ?
Am I missing something, do I need to configure something else ?
I think I fixed the problem.
The get_initial_statesnow return a list of states and I fixed the output size.
I don't know whether it runs yet, but at least the model can be plugged together.
Hm, now I'm having a strange problem:
My code is now:
# this is the actual input, fed to the network
inputs = Input((1, 40, 40, 40))
# now reshape to a sequence
reshaped = Reshape((40, 1, 40, 40))(inputs)
conv_inputs = Input((1, 40, 40))
conv1 = Convolution2D(3, 3, 3, activation='relu', border_mode='same')(conv_inputs)
convmodel = Model(input=conv_inputs, output=conv1)
convmodel.summary()
#apply the segmentation to each layer
time_dist=TimeDistributed(convmodel)(reshaped)
from cgru import CGRU
up=CGRU(go_backwards=False, return_sequences=True, name="up")
up=up(time_dist)
output=Reshape([1,40,40,40])(up)
model=Model(input=inputs, output=output)
print(model.summary())
On a computer with Theano as the backend, this works.
The model summary is:
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_1 (InputLayer) (None, 1, 40, 40, 40) 0
____________________________________________________________________________________________________
reshape_1 (Reshape) (None, 40, 1, 40, 40) 0 input_1[0][0]
____________________________________________________________________________________________________
timedistributed_1 (TimeDistribute(None, 40, 3, 40, 40) 30 reshape_1[0][0]
____________________________________________________________________________________________________
up (CGRU) (None, 40, 1, 40, 40) 0 timedistributed_1[0][0]
____________________________________________________________________________________________________
reshape_2 (Reshape) (None, 1, 40, 40, 40) 0 up[0][0]
====================================================================================================
Total params: 30
____________________________________________________________________________________________________
But on a computer with tensorflow as the backend, the code fails.
I've added a model.summary() for the convmodel. Up to that it works:
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_4 (InputLayer) (None, 1, 40, 40) 0
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D) (None, 3, 40, 40) 30 input_4[0][0]
====================================================================================================
Total params: 30
But then the program crashes:
ValueError: Shapes (?, ?, 40, 40) and (40, ?, 40) are not compatible
It seems like Theano and Tensorflow have different (and incompatible) placeholders for the batch_size ?
Please note that I configured Keras to use "th" image layout in both cases.
Apparently this is caused by shape inference. It can't determine the input_shape of the new CGRU layer. Which definitely makes sense.
But I wonder why the code runs without problems for a Theano backend. I'll debug some more
I think that the layer works: It can be used in a Theano model, the model learns, removing the layer reduces performance.
The question about how to extend Keras is basically solved.
And the problem with the Theano/Tensorflow incompatibility seems like a different issue.
I'll close this one.
Most helpful comment
Apparently this is caused by shape inference. It can't determine the input_shape of the new CGRU layer. Which definitely makes sense.
But I wonder why the code runs without problems for a Theano backend. I'll debug some more