Keras: Easy way to combine CNN + LSTM? (e.g. LRCN network)

Created on 16 Jul 2015 · 26Comments · Source: keras-team/keras

I was wondering if there was a straightforward way in Keras (or would I have to write my own layer?) to combine a convolutional network which extracts features and then feeds it to an LSTM (or GRU, MUT1, etc) network (similar to Figure 1 of this paper: http://arxiv.org/pdf/1411.4389v3.pdf)?

Specifically, I want the input i_t to the convolutional network at a given timestep t to consist of n frames (in the case of Figure 1, n = 1), so i_t would be of dimension (num_rows, num_cols, n), from which the features of i_t are extracted and fed into an LSTM network, which produces a prediction y_t and a hidden state h_t. Then the next input i_{t+1} of dimension (num_rows, num_cols, n) is fed into the same convolutional network which outputs the features of i_{t+1} to the LSTM layer at timestep t+1, and h_t is fed to the LSTM layer at timestep t+1 from the LSTM layer at timestep t, from which the prediction y_{t+1} and hidden state h_{t+1} are produced, and so on.

I'm aware of https://github.com/fchollet/keras/issues/129; however, in this case, I believe the original poster wanted it so that the convolutional layer does not accept new inputs across timesteps (so something like Figure 3, pg. 4 of this paper: http://arxiv.org/pdf/1411.4555v2.pdf), which is not what I want.

Thanks in advance!

Source

anayebi

👍3 👎2

Most helpful comment

Hi Afsaneh,

In order to have the CNN layers to interact with the LSTM layer, they need to be distributed across time. I have made time distributed versions of Convolution2D, MaxPooling2D, and Flatten so that they can work with the LSTM layer. They can be found on my Github repo here: https://github.com/anayebi/keras-extra

So, as an example, you could do what you propose above as follows (untested code):

from keras.layers.extra import TimeDistributedConvolution2D, TimeDistributedMaxPooling2D, TimeDistributedFlatten

n_hidden = 256
n_samples = 100
n_timesteps = 16

model = Sequential()
model.add(TimeDistributedConvolution2D(32, 5, 5, border_mode='same', input_shape=(n_timesteps, 28, 28)))
model.add(TimeDistributedMaxPooling2D(pool_size=(nb_pool, nb_pool)))
model.add(Activation('relu'))
model.add(TimeDistributedFlatten())
model.add(LSTM(256, return_sequences=True))
model.add(TimeDistributedDense(nb_classes))
model.add(Activation('softmax'))

anayebi on 18 Dec 2015

👍10

All 26 comments

@anayebi read the documentation on the built in reshape layer. That should give you everything you need. Don't flatten the CNN outputs, use reshape instead.

simonhughes22 on 17 Jul 2015

Add these two methods to Sequential object:

def Conv2LSTM(self, num_filters, input_width):
self.add(Permute((0, 3, 2, 1)))
self.add(Reshape(input_width, num_filters))

def LSTM2ConvLayer(self):
self.add(Permute((0, 2, 'x', 1)))

Add this class to layers:

class Permute(Layer):
'''
Permute the dimensions of the data according to the given tuple
'''
def init(self, dims):
super(Permute,self).init()
self.dims = dims

def get_output(self, train):
    X = self.get_input(train)
    return X.dimshuffle(self.dims)

def get_config(self):
    return {"name":self.__class__.__name__,
        "dims":self.dims}

loyeamen on 17 Jul 2015

😄2 👎2 👍2

Hi,
I am going to train CNN + LSTM, however, I was unable to exactly determine input of LSTM.
I would be appreciated if you could help.
I have sequence of frames and I am going to map them to a sequence of predefined labels.(Seq 2 Seq mapping)
My input of network is 100 sequences of 16 consecutive frames, and each frame is 28* 28.
My problem is how to define permute and reshape to connect the output layer of convolution layer to LSTM.

n_hidden = 256
n_samples = 100
n_timesteps = 16

model = Sequential()
model.add(Convolution2D(32, 5, 5, border_mode='same', input_shape=(1, 28, 28)))
model.add(MaxPooling2D(pool_size=(nb_pool, nb_pool)))
model.add(Activation('relu'))
model.add(Permute((0, 3, 2, 1)))
model.add(Reshape(?))
model.add(LSTM(256))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
rmsprop = RMSprop(lr=learning_rate)
model.compile(loss='categorical_crossentropy', optimizer=rmsprop)

afsanehghasemi on 16 Dec 2015

👍9

Hi Afsaneh,

So, as an example, you could do what you propose above as follows (untested code):

from keras.layers.extra import TimeDistributedConvolution2D, TimeDistributedMaxPooling2D, TimeDistributedFlatten

n_hidden = 256
n_samples = 100
n_timesteps = 16

anayebi on 18 Dec 2015

👍10

@anayebi some bug here. In theano_backend.p

OnlySang on 18 Dec 2015

Hi,
Here is an Error, I have got in theano_backend.py,
File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 55, in placeholder
raise Exception('ndim too large: ' + str(ndim))

afsanehghasemi on 23 Dec 2015

Hello there, I get this error:
Exception: Invalid input shape - Layer expects input ndim=5, was provided with input shape (None, 30, 256, 256)

when I use this architecture:

model = Sequential()
model.add(TimeDistributedConvolution2D(64, 3, 3, border_mode='same', input_shape=(30, 256, 256)))
model.add(Activation('relu'))
model.add(TimeDistributedConvolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(TimeDistributedMaxPooling2D(pool_size=(2, 2)))
model.add(Activation('relu'))
model.add(TimeDistributedConvolution2D(128, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(TimeDistributedConvolution2D(128, 3, 3))
model.add(Activation('relu'))
model.add(TimeDistributedMaxPooling2D(pool_size=(2, 2)))
model.add(Activation('relu'))
model.add(TimeDistributedConvolution2D(128, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(TimeDistributedConvolution2D(128, 3, 3))
model.add(Activation('relu'))
model.add(TimeDistributedMaxPooling2D(pool_size=(2, 2)))
model.add(Activation('relu'))
model.add(TimeDistributedConvolution2D(256, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(TimeDistributedConvolution2D(256, 3, 3))
model.add(Activation('relu'))
model.add(TimeDistributedMaxPooling2D(pool_size=(2, 2)))
model.add(Activation('relu'))
model.add(TimeDistributedFlatten())
model.add(LSTM(512, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(512, return_sequences=True))
model.add(TimeDistributedDense(600))
model.add(Activation('softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

Am I using the package wrong? Or is there something I need to implement somewhere? Thanks for taking the time to read this

AntreasAntoniou on 28 Dec 2015

@AntreasAntoniou Yes, the package expects an input that is 5D: (num_samples, num_timesteps, channels, rows, cols). Your input is missing the extra num_timesteps dimension. If you experience any more issues, feel free to post to the issues page on my repo: https://github.com/anayebi/keras-extra

@OnlySang @afsanehghasemi At the moment, these layers don't work with the current version of Keras (which uses multiple backends). The errors you point out have to do with theano_backend.py, which does not allow input dimensions larger than 4. If you use a version of Keras from September or October (before the new update), then the layers as they currently are should work. On the other hand, if you don't want to use a slightly earlier Keras version, I am planning on releasing an updated version of the layers soon that should work with the newest Keras version. I will update this thread when I do release the update.

anayebi on 28 Dec 2015

Yes, I did realise that, I changed the input to (30, 1, 256, 256). Thanks for your prompt support here. I will be waiting for your update.

AntreasAntoniou on 28 Dec 2015

I have pushed an updated version of my code that works with the newest
version of Keras (this includes modifying theano_backend.py to include
support for 5D tensors). For any issues/bugs, feel free to let me know
(preferably on the issues page on my repo,
https://github.com/anayebi/keras-extra, rather than on this thread :)

@fchollet On a related note, could these layers be considered for inclusion
in Keras?

It's somewhat impractical to keep up with the changes in Keras as a
separate repo as it is constantly changing. I know @fchollet had plans for
a general time distributed layer, though I think it's been on the backlog
for a while. However, if making a general time distributed layer is too
much work or is taking too much time, and if TimeDistributedConvolution2D,
TimeDistributedPooling2D, and TimeDistributedFlatten seem to be something
that could be useful to Keras users (especially those training CNN-RNN
nets), then they (or a subset thereof) may be worth considering for
inclusion (in fact, TimeDistributedDense and TimeDistributedMerge
are already specific time distributed layers).

It could be best to put all the time distributed layers in one place, to be
used in conjunction with RNNs. Or, even better yet, for Flatten, Convolution2D, and Pooling2D, we can have a flag (say, td) such that if td is set to True, the layers do the appropriate operations to be TimeDistributed. But I'll leave the API decisions to others.

On Tuesday, December 22, 2015, afsanehghasemi [email protected]
wrote:

Hi,
Here is an Error, I have got in theano_backend.py,
File
"/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py",
line 55, in placeholder
raise Exception('ndim too large: ' + str(ndim))

—
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/401#issuecomment-166799593.

anayebi on 4 Jan 2016

👍1

Thanks so much man. I will go try it out now.

AntreasAntoniou on 4 Jan 2016

Hello,

I am using "keras-extra" in order to have CNN layers to interact with the LSTM layer. But when I run the code this error occurs "TypeError: rnn() got an unexpected keyword argument 'mask' ". Would you please help me on this issue?
Thank you very much in advance,

Shadi94 on 9 Feb 2016

👍3

@Shadi94 Could you build a github gist with your code?

AntreasAntoniou on 9 Feb 2016

1623 provides a Convolutional3D layer in order to consider the time series as a dimension. I guess this combination of CNN+LSTM can be directly implemented by stacking LSTM layer on top of CNN.

XuesongYang on 25 Feb 2016

Hi @afsanehghasemi , Were you able to solve the issue? If yes, can you explain?

dakshvar22 on 6 May 2016

👍3

@Shadi94 I also have the same errors: "TypeError: rnn() got an unexpected keyword argument 'mask' ", would you please update your fix? Thanks very much

gtyopal on 26 Jun 2016

i've got a similar task.. and i think they develops a layer called permute or reshape to help with.. but is there some sample codes? @afsanehghasemi have you figure it out? thanks

mangolzy on 25 Dec 2016

Is there possibility to use stateful in lstm?
I am having the trouble using it

DanlanChen on 5 Jan 2017

Hi,

I am trying to work with CNN+LSTM and facing problem in using LSTM after CNN. From the last ConvLayer of the network i am getting the shape 32x8x26, how can i use LSTM after this?

Thanks.

anirudhgupta22 on 9 Apr 2017

@anirudhgupta22
I have the same problem, do you get the answer?

rolai on 19 Apr 2017

I recently opened a new issue about it (See: Here 8268) but without an answer. Is it the same that you want?

kgruhler on 6 Nov 2017

@anayebi hello i followed #4172 #421 and tries every possible way to integrate CNN (VGG16) with LSTM in keras, but i am continuously getting error

_below is my code_

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
from keras.models import Sequential
from keras.layers import Flatten, LSTM, Dense, TimeDistributed, InputLayer
from keras.layers.core import Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D
import numpy as np

#%%     Image Generator for datasets
datagen = ImageDataGenerator(
        rotation_range = 0,
        width_shift_range = 0,
        height_shift_range = 0,
        shear_range = 0,
        zoom_range = 0,
        rescale = None,
        horizontal_flip = False,
        fill_mode = 'nearest')

train_generator = datagen.flow_from_directory(
        'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Train',
        target_size = (224,224),
        batch_size = 1, 
        class_mode = 'categorical')

test_generator = datagen.flow_from_directory(
        'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Test',
        target_size = (224,224),
        batch_size = 1, 
        class_mode = 'categorical')


#%%     Convolutional Model (Basically Its VGG) + LSTM model (TimeDistributed)

model = Sequential()

model.add(InputLayer(input_shape=(5, 224, 224, 3)))
model.add(TimeDistributed(Convolution2D(64, (3, 3))))
model.add(TimeDistributed(MaxPooling2D((2,2), strides=(2,2))))
model.add(LSTM(10))
model.add(Dense(3))

model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])
model.fit_generator(train_generator, epochs=1, steps_per_epoch=len(train_generator.filenames))

i am getting error that (here 5 is time step, 224,224 is image dimension and 3 is channel )
Error when checking input: expected input_2 to have 5 dimensions, but got array with shape (5, 224, 224, 3)

but when i supply samples also
model.add(InputLayer(input_shape=(1, 5, 224, 224, 3)))
i get this error
number of input channels does not match corresponding dimension of filter, 224 != 3

usmanatif on 6 Dec 2017

👍5

@usmanatif I also ran into this error, and can’t figure out ways to feed data into cnn+LSTM by generator. Did you solve this problem?

HTLife on 8 Feb 2018

👍2

@usmanatif @HTLife Did you try to permute the input dimensions to (1, 3, 224, 224, 5)?

dksakkos on 5 Jun 2018

Hi Afsaneh,

In order to have the CNN layers to interact with the LSTM layer, they need to be distributed across time. I have made time distributed versions of Convolution2D, MaxPooling2D, and Flatten so that they can work with the LSTM layer. They can be found on my Github repo here: https://github.com/anayebi/keras-extra

So, as an example, you could do what you propose above as follows (untested code):

from keras.layers.extra import TimeDistributedConvolution2D, TimeDistributedMaxPooling2D, TimeDistributedFlatten

n_hidden = 256
n_samples = 100
n_timesteps = 16

model = Sequential()
model.add(TimeDistributedConvolution2D(32, 5, 5, border_mode='same', input_shape=(n_timesteps, 28, 28)))
model.add(TimeDistributedMaxPooling2D(pool_size=(nb_pool, nb_pool)))
model.add(Activation('relu'))
model.add(TimeDistributedFlatten())
model.add(LSTM(256, return_sequences=True))
model.add(TimeDistributedDense(nb_classes))
model.add(Activation('softmax'))

anish9 on 6 Feb 2019

@anayebi hello i followed #4172 #421 and tries every possible way to integrate CNN (VGG16) with LSTM in keras, but i am continuously getting error

_below is my code_

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
from keras.models import Sequential
from keras.layers import Flatten, LSTM, Dense, TimeDistributed, InputLayer
from keras.layers.core import Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D
import numpy as np

#%%     Image Generator for datasets
datagen = ImageDataGenerator(
        rotation_range = 0,
        width_shift_range = 0,
        height_shift_range = 0,
        shear_range = 0,
        zoom_range = 0,
        rescale = None,
        horizontal_flip = False,
        fill_mode = 'nearest')

train_generator = datagen.flow_from_directory(
        'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Train',
        target_size = (224,224),
        batch_size = 1, 
        class_mode = 'categorical')

test_generator = datagen.flow_from_directory(
        'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Test',
        target_size = (224,224),
        batch_size = 1, 
        class_mode = 'categorical')


#%%     Convolutional Model (Basically Its VGG) + LSTM model (TimeDistributed)

model = Sequential()

model.add(InputLayer(input_shape=(5, 224, 224, 3)))
model.add(TimeDistributed(Convolution2D(64, (3, 3))))
model.add(TimeDistributed(MaxPooling2D((2,2), strides=(2,2))))
model.add(LSTM(10))
model.add(Dense(3))

model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])
model.fit_generator(train_generator, epochs=1, steps_per_epoch=len(train_generator.filenames))

but when i supply samples also
model.add(InputLayer(input_shape=(1, 5, 224, 224, 3)))
i get this error
number of input channels does not match corresponding dimension of filter, 224 != 3

have u solved the issue of supplying the input using keras image generator?
My model is compiling fine but their is an dimension issue in fit_generator

how u solved your problem?

Any help would be appreciated
Thank you!