Keras: Using LSTM after CNN

Created on 26 Feb 2017 · 9Comments · Source: keras-team/keras

I have made a dataset which has these dimensions:
X_train: (2000, 100, 32, 32, 3)
y_train: (2000,1)
Here, 2000 is the number of instances (batches of data), 100 is the number of samples in each batch, 32 is the image rows and cols, and 3 is the number of channels (RGB).

I have written this code which applies an LSTM after a CNN, however, I get this error:
ValueError: Input 0 is incompatible with layer lstm_layer: expected ndim=3, found ndim=2

This is my code:

import keras
from keras.layers import Input ,Dense, Dropout, Activation, LSTM
from keras.layers import Convolution2D, MaxPooling2D, Flatten, Reshape
from keras.models import Sequential
from keras.layers.wrappers import TimeDistributed
from keras.layers.pooling import GlobalAveragePooling1D
from keras.optimizers import SGD
from keras.utils import np_utils
from keras.models import Model


import numpy as np

timesteps=100;
number_of_samples=2500;
nb_samples=number_of_samples;
frame_row=32;
frame_col=32;
channels=3;

nb_epoch=1;
batch_size=timesteps;

data= np.random.random((2500,timesteps,frame_row,frame_col,channels))
label=np.random.random((2500,timesteps,1))

X_train=data[0:2000,:]
y_train=label[0:2000]

X_test=data[2000:,:]
y_test=label[2000:,:]

#%%

model=Sequential();                          

model.add(Convolution2D(32, 3, 3, border_mode='same',
                        input_shape=X_train.shape[2:]))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Convolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))


model.add(Dense(35, input_shape=(timesteps,512), name="first_dense" ));
#model.add(Dense(1, name="test_dense"));         

model.add(LSTM(20, return_sequences=True, name="lstm_layer"));

#%%
model.add(TimeDistributed(Dense(1), name="time_distr_dense_one"))
model.add(GlobalAveragePooling1D(name="global_avg"))

#%%

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
#%%
model.fit(X_train, y_train,
          batch_size=batch_size,
          nb_epoch=nb_epoch,
          validation_data=(X_test, y_test))

Source

MahdiKhodayar

Most helpful comment

Hello,

as I understood your code, you want to provide to your model sequences of images. In your CNN-related layers you build up your model to use for single images, not for sequences. To fix your problem you have to wrap your CNN-Layers with the TimeDistributed() method.

(I removed some CNN-layers in order to compile your model on my machine.)

import keras
from keras.layers import Input ,Dense, Dropout, Activation, LSTM
from keras.layers import Convolution2D, MaxPooling2D, Flatten, Reshape
from keras.models import Sequential
from keras.layers.wrappers import TimeDistributed
from keras.layers.pooling import GlobalAveragePooling1D
from keras.optimizers import SGD
from keras.utils import np_utils
from keras.models import Model

import numpy as np

timesteps=100;
number_of_samples=2500;
nb_samples=number_of_samples;
frame_row=32;
frame_col=32;
channels=3;

nb_epoch=1;
batch_size=timesteps;

data= np.random.random((2500,timesteps,frame_row,frame_col,channels))
label=np.random.random((2500,1))

X_train=data[0:2000,:]
y_train=label[0:2000]

X_test=data[2000:,:]
y_test=label[2000:,:]

#%%

model=Sequential();                          

model.add(TimeDistributed(Convolution2D(32, 3, 3, border_mode='same'), input_shape=X_train.shape[1:]))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(Convolution2D(32, 3, 3)))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Dropout(0.25)))

model.add(TimeDistributed(Flatten()))
model.add(TimeDistributed(Dense(512)))


model.add(TimeDistributed(Dense(35, name="first_dense" )))

model.add(LSTM(20, return_sequences=True, name="lstm_layer"));

#%%
model.add(TimeDistributed(Dense(1), name="time_distr_dense_one"))
model.add(GlobalAveragePooling1D(name="global_avg"))

#%%

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

Your first layer has now a new dimension for timesteps:

model.layers[0].output.shape
#TensorShape([Dimension(None), Dimension(100), Dimension(32), Dimension(32), Dimension(32)])

I hope I could help.

UPDATE: Updated the label data

kopytjuk on 26 Feb 2017

👍14

All 9 comments

Hello,

(I removed some CNN-layers in order to compile your model on my machine.)

import keras
from keras.layers import Input ,Dense, Dropout, Activation, LSTM
from keras.layers import Convolution2D, MaxPooling2D, Flatten, Reshape
from keras.models import Sequential
from keras.layers.wrappers import TimeDistributed
from keras.layers.pooling import GlobalAveragePooling1D
from keras.optimizers import SGD
from keras.utils import np_utils
from keras.models import Model

import numpy as np

timesteps=100;
number_of_samples=2500;
nb_samples=number_of_samples;
frame_row=32;
frame_col=32;
channels=3;

nb_epoch=1;
batch_size=timesteps;

data= np.random.random((2500,timesteps,frame_row,frame_col,channels))
label=np.random.random((2500,1))

X_train=data[0:2000,:]
y_train=label[0:2000]

X_test=data[2000:,:]
y_test=label[2000:,:]

#%%

model=Sequential();                          

model.add(TimeDistributed(Convolution2D(32, 3, 3, border_mode='same'), input_shape=X_train.shape[1:]))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(Convolution2D(32, 3, 3)))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Dropout(0.25)))

model.add(TimeDistributed(Flatten()))
model.add(TimeDistributed(Dense(512)))


model.add(TimeDistributed(Dense(35, name="first_dense" )))

model.add(LSTM(20, return_sequences=True, name="lstm_layer"));

#%%
model.add(TimeDistributed(Dense(1), name="time_distr_dense_one"))
model.add(GlobalAveragePooling1D(name="global_avg"))

#%%

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

Your first layer has now a new dimension for timesteps:

model.layers[0].output.shape
#TensorShape([Dimension(None), Dimension(100), Dimension(32), Dimension(32), Dimension(32)])

I hope I could help.

UPDATE: Updated the label data

kopytjuk on 26 Feb 2017

👍14

Hello,
thank you very much for the example, I have a problem on model.fit:

ValueError: Error when checking target: expected global_avg to have 2 dimensions, but got array with shape (2000, 100, 1)

model.fit(X_train, y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
validation_data=(X_test, y_test))

Can you help me?

llandolfi on 14 Jul 2017

I have the same problem as @llandolfi

Any solution?

rcrespocano on 19 Sep 2017

GlobalAveragePooling1D outputs a tensor with (batch_size, channels)

So the training and test data has to look like this:

data= np.random.random((2500,timesteps,frame_row,frame_col,channels))
label=np.random.random((2500,1))

X_train=data[0:2000,:]
y_train=label[0:2000,...]

X_test=data[2000:,:]
y_test=label[2000:,...]`

Note that label data has only 2 dimensions.

kopytjuk on 21 Sep 2017

👍1

@kopytjuk Is there a way to predict an image of same size instead of label for every sample (for an application video frame prediction) in your code above

shaifugpt on 5 Jan 2018

@shaifugpt I do not have the code but you should have a look at convolutional autoencoders in literature. Your latent space vector would be the output of the LSTM layer. Based on that higher deconvolution layers build the next frame for you.

Alternatively to recurrent structures you can take a look at 3d convolutions in order to incorporate video data as input.

kopytjuk on 5 Jan 2018

@kopytjuk
How to modify your sample code to fit_generator?
(How to handle the timesteps dimension?)

HTLife on 7 Feb 2018

I want to use CNN-LSTM for non-image sequence (eye-tracking dataset). My input_shape=X_train.shape[1:] is (100, 1024, 3) instead of (100, 32, 32, 3). How can I do this when input_shape accepts 4 values? I'll really appreciate your answers.

damilolah on 18 Aug 2018

If I Input 1 image into a deep CNN which later gives me say 512 feature maps of 32X32 pixels each....that is my output for 1 image is (32,32,512), now if I want to apply LSTM to this (32,32,512) so as to learn from the pixels of these feature maps ..how should I do it? Also how should I do the same thing for the feature maps from other images too...like 2000 sample images each upon passing through the CNN becomes a 32X32X512 map