Keras: Stacking Convolutions and LSTM

Created on 24 Oct 2016 · 18Comments · Source: keras-team/keras

I would like to stack 2D convolutions and LSTM layers, exactly the same problem as in #129

The proposed solution in #129 is a custom reshape layer
By today, there is a built-in reshape layer in Keras.
Searching the problem on Stackoverflow brings up a similar question, the accepted answer suggests using the built-in layer.

As a toy example, I would like to classify MNIST with a combination of Conv-Layers and an LSTM.
I've sliced the images into four parts and arranged those parts into sequences. Then I've stacked the sequences.
My training data is a numpy array with the shape [60000, 4, 1, 56, 14] where

60000 is the number of samples
4 is the number of timesteps
1 is number of colors, I'm using Theano layout for the image
56 and 14 are width and height

Please note: One of the image-slices has the size 14x14 since I've cut the 28x28 image in four parts. I get a 56 in the shape, because I've created 4 different sequences and stacked them along this axis.

Here is my code so far:

nb_filters=32
kernel_size=(3,3)
pool_size=(2,2)
nb_classes=10
batch_size=64

model=Sequential()

model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode="valid", input_shape=[1, 56,14]))
model.add(Activation("relu"))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=pool_size))


model.add(Reshape((56*14,)))
model.add(Dropout(0.25))
model.add(LSTM(5))
model.add(Dense(50))
model.add(Dense(nb_classes))
model.add(Activation("softmax"))

When run, the Reshape layer raises a ValueError:

ValueError: total size of new array must be unchanged

I also tried to pass the number of timesteps to the Reshape layer: model.add(Reshape((4, 56*14)))
But that doesn't solve the problem either.

What is the correct dimension to give to the Reshape layer ?
Is a Reshape layer the correct solution at all ?

I've posted the same question on stackoverflow.

Source

lhk

Most helpful comment

If I understand you correctly, you want to feed the output of the Convolutional layer, with no time sequence information, to an LSTM layer. The way to do this is to divide it into time steps, either in batches or one element at a time. Since your feature map has a total dimensionality of 32*26*5=4160, you could do this, for example, by treating every element of it as a time step in a sequence. To do this, you should reshape it with Reshape((4160,1)). To sequence N elements at a time, use Reshape((4160/N, N)), where N is an integer divisor of 4160.

kgrm on 24 Oct 2016

👍5

All 18 comments

You can view the output shapes of your model's layers with model.summary(). The Reshape layer must conserve the number of elements in a tensor. The input to an LSTM layer must have samples of shape (nb_timesteps, nb_features).

kgrm on 24 Oct 2016

👍1

Oh, I completely forgot that the convolutions and the pooling would change the dimensions.
Thanks for the tip with model.summary().

I printed the model.summary() just before adding the reshape layer and just after.
The pooling layer before the reshape provides this output shape:

(None, 32, 26, 5)

So I changed the reshape layer to model.add(Reshape((32*26*5,)))

model.summary() shows the output of the reshape layer as:

(None, 4160)

There is no time_step dimension, I expected the model to deal with the time_steps under the hood.
But the LSTM complains:

Exception: Input 0 is incompatible with layer lstm_5: expected ndim=3, found ndim=2

At least the error in the reshape layer is gone. Thank you very much.

How do I account for the timestep dimension ? Do I pass it through the entire network, adding (4, ...) to the input dimension of the convolution, ... ?

lhk on 24 Oct 2016

👍1

kgrm on 24 Oct 2016

👍5

Either:

Look into TimeDistributed() and how to wrap your convolution and pooling operations in that for them to be applied per time step.
Look into the Functional API specifically: "All models are callable, just like layers".

carlthome on 24 Oct 2016

👍1

@kgrm I've got a sequence ready. The current dimension after the reshape layer: 4160, is the result of applying the various convolutions, etc on one of the images in the sequence.

I experimented with LSTMs and Dense layers and found that you can order them basically at random. Therefore I thought Keras would look for any LSTM in the network and keep in mind that the input should have an additional dimension for the timesteps. I thought this would be transparent for the rest of the network.

@carlthome TimeDistributed looks interesting. I'll take a look at it.

So, if you've already got a sequence of images, and you want to apply convolutions to each image, then feed an LSTM with the sequence, reshape is not the correct solution ?

Thank you both very much for the quick and friendly help.

lhk on 24 Oct 2016

Amazing, this seems to do the trick.

I wrapped everything up to the LSTM in the TimeDistributed layer and provided the number of time_steps as an additional input dimension.
This runs without problems.

Am I doing this the right way ?

nb_filters=32
kernel_size=(3,3)
pool_size=(2,2)
nb_classes=10
batch_size=64

model=Sequential()

model.add(TimeDistributed(
    Convolution2D(
        nb_filters, kernel_size[0], kernel_size[1], border_mode="valid"), input_shape=[4, 1, 56,14]))
model.add(TimeDistributed(Activation("relu")))
model.add(TimeDistributed(Convolution2D(nb_filters, kernel_size[0], kernel_size[1])))
model.add(TimeDistributed(Activation("relu")))
model.add(TimeDistributed(MaxPooling2D(pool_size=pool_size)))
model.add(TimeDistributed(Flatten()))
model.add(TimeDistributed(Dropout(0.25)))
model.add(LSTM(5))
model.add(Dense(50))
model.add(Dense(nb_classes))
model.add(Activation("softmax"))

lhk on 24 Oct 2016

👍2

Reshaping could be a correct solution, but only if you want to apply convolution + pooling across your entire sequence at once instead of per every "video frame".

carlthome on 24 Oct 2016

👍1

It seems to learn nicely. I'm at 40% accuracy and the first epoch isn't even over.

Thank you both very much.

If you'd like to answer this on stackoverflow, too, I'll accept it.
Otherwise, I would write the answer myself.

52% accuracy

lhk on 24 Oct 2016

Hi,
I wrote this this code. But I got the following error in training step:
Exception: Error when checking model input: expected timedistributed_input_32 to have 5 dimensions, but got array with shape (60000, 56, 14, 1)

Can you help me how to resolve it? And how you can train your structure?

thank you

from future import print_function
import numpy as np
np.random.seed(1337) # for reproducibility

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten,Input, Convolution2D, MaxPooling2D, UpSampling2D, LSTM

from keras.utils import np_utils
from keras import backend as K
from keras.utils import np_utils

batch_size = 128
nb_classes = 10
nb_epoch = 10

input image dimensions

img_rows, img_cols = 56, 14

number of convolutional filters to use

nb_filters = 32

size of pooling area for max pooling

pool_size = (2, 2)

convolution kernel size

kernel_size = (3, 3)

(X_train, y_train), (X_test, y_test) = mnist.load_data()

if K.image_dim_ordering() == 'th':
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols)
input_shape = (4, 1, img_rows, img_cols)
else:
X_train = X_train.reshape(X_train.shape[0],img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape[0],img_rows, img_cols, 1)
input_shape = (4, img_rows, img_cols, 1)

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

convert class vectors to binary class matrices

Y_trainB = np_utils.to_categorical(y_train, nb_classes)
Y_testB = np_utils.to_categorical(y_test, nb_classes)

from keras.layers import TimeDistributed

model = Sequential()
model.add(TimeDistributed(Convolution2D(32, 3, 3,
border_mode='valid'),input_shape=input_shape))
model.add(Activation('relu'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(Dropout(0.25))
print (model.output_shape)
model.add(TimeDistributed(Convolution2D(32, 3, 3, border_mode='valid')))
model.add(Activation('relu'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(Dropout(0.25))
model.add(TimeDistributed(Flatten()))
print (model.output_shape)
model.add(LSTM(output_dim=64, return_sequences=True))
print (model.output_shape)
model.add(Dropout(0.5))
model.add(TimeDistributed(Dense(2)))
print (model.output_shape)
model.add(Activation('sigmoid'))
print (model.output_shape)

model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])

print('Train...')

model.fit(X_train, Y_trainB, batch_size=batch_size, nb_epoch=nb_epoch,
validation_data=(X_test, Y_testB))

MaryamAkhavan on 10 Nov 2016

Did you aldready find the solution to the problem with 'expected timedistributed_input_32 to have 5 dimensions, but got array with shape (60000, 56, 14, 1)'?
I am facing the same issue.Thanks

pr2192 on 12 Oct 2017

👀1

@lhk hello i followed #4172 #421 and tries every possible way to integrate CNN (VGG16) with LSTM in keras, but i am continuously getting error

_below is my code_

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
from keras.models import Sequential
from keras.layers import Flatten, LSTM, Dense, TimeDistributed, InputLayer
from keras.layers.core import Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D
import numpy as np

#%%     Image Generator for datasets
datagen = ImageDataGenerator(
        rotation_range = 0,
        width_shift_range = 0,
        height_shift_range = 0,
        shear_range = 0,
        zoom_range = 0,
        rescale = None,
        horizontal_flip = False,
        fill_mode = 'nearest')

train_generator = datagen.flow_from_directory(
        'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Train',
        target_size = (224,224),
        batch_size = 1, 
        class_mode = 'categorical')

test_generator = datagen.flow_from_directory(
        'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Test',
        target_size = (224,224),
        batch_size = 1, 
        class_mode = 'categorical')


#%%     Convolutional Model (Basically Its VGG) + LSTM model (TimeDistributed)

model = Sequential()

model.add(InputLayer(input_shape=(5, 224, 224, 3)))
model.add(TimeDistributed(Convolution2D(64, (3, 3))))
model.add(TimeDistributed(MaxPooling2D((2,2), strides=(2,2))))
model.add(LSTM(10))
model.add(Dense(3))

model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])
model.fit_generator(train_generator, epochs=1, steps_per_epoch=len(train_generator.filenames))

but when i supply samples also
model.add(InputLayer(input_shape=(1, 5, 224, 224, 3)))
i get this error
number of input channels does not match corresponding dimension of filter, 224 != 3

usmanatif on 6 Dec 2017

Hi All,

Is there any way to provide output from Conv2d layer as input for LSTM layer?
Please let me know.

Thanks!!

amay1212 on 19 Dec 2018

Yes you can be removing the Last Layer pr convo and Feed the Output of the
conv Layer as Lstm Input .

On Wednesday, December 19, 2018, amay1212 notifications@github.com wrote:

Hi All,

Is there any way to provide output from Conv2d layer as input for LSTM
layer?
Please let me know.

Thanks!!

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/keras-team/keras/issues/4172#issuecomment-448763999,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AfORNeYBm8J-mCdV02Lio-2jSPKRqlPzks5u6rtcgaJpZM4Ke3FG
.

--
Sent from my iPhone

pr2192 on 20 Dec 2018

Hi All,

Im getting this error

env (4, 10, 10)
Traceback (most recent call last):
File "train.py", line 140, in
main()
File "train.py", line 123, in main
model = create_dqn_model(env, num_last_frames=4)
File "train.py", line 107, in create_dqn_model
model.summary()
File "/anaconda3/lib/python3.6/site-packages/keras/engine/network.py", line 1247, in summary
'This model has never been called, thus its weights '
ValueError: This model has never been called, thus its weights have not yet been created, so no summary can be displayed. Build the model first (e.g. by calling it on some test data).

Here is my code...

model = Sequential()
print('env' , (num_last_frames, ) + env.observation_shape)

Printing the model params

Output => env (4,) ObservationSHape (10, 10)

Convolutions.()

model.add(TimeDistributed(Conv2D(
16,
kernel_size=(3, 3),
strides=(1, 1),
data_format='channels_first',
input_shape= (num_last_frames, ) + env.observation_shape
)))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(Conv2D(
32,
kernel_size=(3, 3),
strides=(1, 1),
data_format='channels_first'
)))
model.add(TimeDistributed(Activation('relu')))

Dense layers.

model.add(TimeDistributed(Flatten()))

Timedistributed changes and the below 1 liner code change by me.

model.add(Dropout(0.25))

Adding Lstm layer to the model MY CHANGE on 24/12/2018.

regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))

regressor.add(Dropout(0.2))

model.add(LSTM(units = 200 , return_sequences = True))
model.add(Dropout(0.25))
model.add(LSTM(units = 200 , return_sequences = True))
model.add(Dropout(0.25))
model.add(LSTM(units = 200))

My code change END

model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dense(env.num_actions))

model.summary()
model.compile(RMSprop(), 'MSE')
Please let me know how can i fix this.

Thanks in adv!!

amay1212 on 24 Dec 2018

@lhk Hello can u please explain how should I interpret the input size in your above code
model.add(TimeDistributed(
Convolution2D(
nb_filters, kernel_size[0], kernel_size[1], border_mode="valid"), input_shape=[4, 1, 56,14]))

please explain. my input images are of size 64*128 how can I pass these size of images in the network.

anamika06jain on 18 Jun 2019

@lhk hello i followed #4172 #421 and tries every possible way to integrate CNN (VGG16) with LSTM in keras, but i am continuously getting error

_below is my code_

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
from keras.models import Sequential
from keras.layers import Flatten, LSTM, Dense, TimeDistributed, InputLayer
from keras.layers.core import Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D
import numpy as np

#%%     Image Generator for datasets
datagen = ImageDataGenerator(
        rotation_range = 0,
        width_shift_range = 0,
        height_shift_range = 0,
        shear_range = 0,
        zoom_range = 0,
        rescale = None,
        horizontal_flip = False,
        fill_mode = 'nearest')

train_generator = datagen.flow_from_directory(
        'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Train',
        target_size = (224,224),
        batch_size = 1, 
        class_mode = 'categorical')

test_generator = datagen.flow_from_directory(
        'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Test',
        target_size = (224,224),
        batch_size = 1, 
        class_mode = 'categorical')


#%%     Convolutional Model (Basically Its VGG) + LSTM model (TimeDistributed)

model = Sequential()

model.add(InputLayer(input_shape=(5, 224, 224, 3)))
model.add(TimeDistributed(Convolution2D(64, (3, 3))))
model.add(TimeDistributed(MaxPooling2D((2,2), strides=(2,2))))
model.add(LSTM(10))
model.add(Dense(3))

model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])
model.fit_generator(train_generator, epochs=1, steps_per_epoch=len(train_generator.filenames))

but when i supply samples also
model.add(InputLayer(input_shape=(1, 5, 224, 224, 3)))
i get this error
number of input channels does not match corresponding dimension of filter, 224 != 3

@usmanatif I am receiving the same problem, did you solve it?

aezco on 4 Aug 2019

Amazing, this seems to do the trick.

I wrapped everything up to the LSTM in the TimeDistributed layer and provided the number of time_steps as an additional input dimension.
This runs without problems.

Am I doing this the right way ?

nb_filters=32
kernel_size=(3,3)
pool_size=(2,2)
nb_classes=10
batch_size=64

model=Sequential()

model.add(TimeDistributed(
    Convolution2D(
        nb_filters, kernel_size[0], kernel_size[1], border_mode="valid"), input_shape=[4, 1, 56,14]))
model.add(TimeDistributed(Activation("relu")))
model.add(TimeDistributed(Convolution2D(nb_filters, kernel_size[0], kernel_size[1])))
model.add(TimeDistributed(Activation("relu")))
model.add(TimeDistributed(MaxPooling2D(pool_size=pool_size)))
model.add(TimeDistributed(Flatten()))
model.add(TimeDistributed(Dropout(0.25)))
model.add(LSTM(5))
model.add(Dense(50))
model.add(Dense(nb_classes))
model.add(Activation("softmax"))

How did u do that?
can u give me a pic of that

RafeedRahman2015 on 18 Apr 2020

@lhk hello i followed #4172 #421 and tries every possible way to integrate CNN (VGG16) with LSTM in keras, but i am continuously getting error

_below is my code_

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
from keras.models import Sequential
from keras.layers import Flatten, LSTM, Dense, TimeDistributed, InputLayer
from keras.layers.core import Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D
import numpy as np

#%%     Image Generator for datasets
datagen = ImageDataGenerator(
        rotation_range = 0,
        width_shift_range = 0,
        height_shift_range = 0,
        shear_range = 0,
        zoom_range = 0,
        rescale = None,
        horizontal_flip = False,
        fill_mode = 'nearest')

train_generator = datagen.flow_from_directory(
        'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Train',
        target_size = (224,224),
        batch_size = 1, 
        class_mode = 'categorical')

test_generator = datagen.flow_from_directory(
        'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Test',
        target_size = (224,224),
        batch_size = 1, 
        class_mode = 'categorical')


#%%     Convolutional Model (Basically Its VGG) + LSTM model (TimeDistributed)

model = Sequential()

model.add(InputLayer(input_shape=(5, 224, 224, 3)))
model.add(TimeDistributed(Convolution2D(64, (3, 3))))
model.add(TimeDistributed(MaxPooling2D((2,2), strides=(2,2))))
model.add(LSTM(10))
model.add(Dense(3))

model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])
model.fit_generator(train_generator, epochs=1, steps_per_epoch=len(train_generator.filenames))

i am getting error that (here 5 is time step, 224,224 is image dimension and 3 is channel )
Error when checking input: expected input_2 to have 5 dimensions, but got array with shape (5, 224, 224, 3)
but when i supply samples also
model.add(InputLayer(input_shape=(1, 5, 224, 224, 3)))
i get this error
number of input channels does not match corresponding dimension of filter, 224 != 3

@usmanatif I am receiving the same problem, did you solve it?

DID YOU GET THE SOLUTION??? I am also getting the same problem!!!!

RafeedRahman2015 on 18 Apr 2020

Was this page helpful?

0 / 5 - 0 ratings