I have made a dataset which has these dimensions:
X_train: (2000, 100, 32, 32, 3)
y_train: (2000,1)
Here, 2000 is the number of instances (batches of data), 100 is the number of samples in each batch, 32 is the image rows and cols, and 3 is the number of channels (RGB).
I have written this code which applies an LSTM after a CNN, however, I get this error:
ValueError: Input 0 is incompatible with layer lstm_layer: expected ndim=3, found ndim=2
This is my code:
import keras
from keras.layers import Input ,Dense, Dropout, Activation, LSTM
from keras.layers import Convolution2D, MaxPooling2D, Flatten, Reshape
from keras.models import Sequential
from keras.layers.wrappers import TimeDistributed
from keras.layers.pooling import GlobalAveragePooling1D
from keras.optimizers import SGD
from keras.utils import np_utils
from keras.models import Model
import numpy as np
timesteps=100;
number_of_samples=2500;
nb_samples=number_of_samples;
frame_row=32;
frame_col=32;
channels=3;
nb_epoch=1;
batch_size=timesteps;
data= np.random.random((2500,timesteps,frame_row,frame_col,channels))
label=np.random.random((2500,timesteps,1))
X_train=data[0:2000,:]
y_train=label[0:2000]
X_test=data[2000:,:]
y_test=label[2000:,:]
#%%
model=Sequential();
model.add(Convolution2D(32, 3, 3, border_mode='same',
input_shape=X_train.shape[2:]))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Convolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(35, input_shape=(timesteps,512), name="first_dense" ));
#model.add(Dense(1, name="test_dense"));
model.add(LSTM(20, return_sequences=True, name="lstm_layer"));
#%%
model.add(TimeDistributed(Dense(1), name="time_distr_dense_one"))
model.add(GlobalAveragePooling1D(name="global_avg"))
#%%
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
#%%
model.fit(X_train, y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
validation_data=(X_test, y_test))
Hello,
as I understood your code, you want to provide to your model sequences of images. In your CNN-related layers you build up your model to use for single images, not for sequences. To fix your problem you have to wrap your CNN-Layers with the TimeDistributed() method.
(I removed some CNN-layers in order to compile your model on my machine.)
import keras
from keras.layers import Input ,Dense, Dropout, Activation, LSTM
from keras.layers import Convolution2D, MaxPooling2D, Flatten, Reshape
from keras.models import Sequential
from keras.layers.wrappers import TimeDistributed
from keras.layers.pooling import GlobalAveragePooling1D
from keras.optimizers import SGD
from keras.utils import np_utils
from keras.models import Model
import numpy as np
timesteps=100;
number_of_samples=2500;
nb_samples=number_of_samples;
frame_row=32;
frame_col=32;
channels=3;
nb_epoch=1;
batch_size=timesteps;
data= np.random.random((2500,timesteps,frame_row,frame_col,channels))
label=np.random.random((2500,1))
X_train=data[0:2000,:]
y_train=label[0:2000]
X_test=data[2000:,:]
y_test=label[2000:,:]
#%%
model=Sequential();
model.add(TimeDistributed(Convolution2D(32, 3, 3, border_mode='same'), input_shape=X_train.shape[1:]))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(Convolution2D(32, 3, 3)))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Dropout(0.25)))
model.add(TimeDistributed(Flatten()))
model.add(TimeDistributed(Dense(512)))
model.add(TimeDistributed(Dense(35, name="first_dense" )))
model.add(LSTM(20, return_sequences=True, name="lstm_layer"));
#%%
model.add(TimeDistributed(Dense(1), name="time_distr_dense_one"))
model.add(GlobalAveragePooling1D(name="global_avg"))
#%%
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
Your first layer has now a new dimension for timesteps:
model.layers[0].output.shape
#TensorShape([Dimension(None), Dimension(100), Dimension(32), Dimension(32), Dimension(32)])
I hope I could help.
UPDATE: Updated the label data
Hello,
thank you very much for the example, I have a problem on model.fit:
ValueError: Error when checking target: expected global_avg to have 2 dimensions, but got array with shape (2000, 100, 1)
model.fit(X_train, y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
validation_data=(X_test, y_test))
Can you help me?
I have the same problem as @llandolfi
Any solution?
GlobalAveragePooling1D outputs a tensor with (batch_size, channels)
So the training and test data has to look like this:
data= np.random.random((2500,timesteps,frame_row,frame_col,channels))
label=np.random.random((2500,1))
X_train=data[0:2000,:]
y_train=label[0:2000,...]
X_test=data[2000:,:]
y_test=label[2000:,...]`
Note that label data has only 2 dimensions.
@kopytjuk Is there a way to predict an image of same size instead of label for every sample (for an application video frame prediction) in your code above
@shaifugpt I do not have the code but you should have a look at convolutional autoencoders in literature. Your latent space vector would be the output of the LSTM layer. Based on that higher deconvolution layers build the next frame for you.
Alternatively to recurrent structures you can take a look at 3d convolutions in order to incorporate video data as input.
@kopytjuk
How to modify your sample code to fit_generator?
(How to handle the timesteps dimension?)
I want to use CNN-LSTM for non-image sequence (eye-tracking dataset). My input_shape=X_train.shape[1:] is (100, 1024, 3) instead of (100, 32, 32, 3). How can I do this when input_shape accepts 4 values? I'll really appreciate your answers.
If I Input 1 image into a deep CNN which later gives me say 512 feature maps of 32X32 pixels each....that is my output for 1 image is (32,32,512), now if I want to apply LSTM to this (32,32,512) so as to learn from the pixels of these feature maps ..how should I do it? Also how should I do the same thing for the feature maps from other images too...like 2000 sample images each upon passing through the CNN becomes a 32X32X512 map
Most helpful comment
Hello,
as I understood your code, you want to provide to your model sequences of images. In your CNN-related layers you build up your model to use for single images, not for sequences. To fix your problem you have to wrap your CNN-Layers with the TimeDistributed() method.
(I removed some CNN-layers in order to compile your model on my machine.)
Your first layer has now a new dimension for timesteps:
I hope I could help.
UPDATE: Updated the label data