I would like to stack 2D convolutions and LSTM layers, exactly the same problem as in #129
The proposed solution in #129 is a custom reshape layer
By today, there is a built-in reshape layer in Keras.
Searching the problem on Stackoverflow brings up a similar question, the accepted answer suggests using the built-in layer.
As a toy example, I would like to classify MNIST with a combination of Conv-Layers and an LSTM.
I've sliced the images into four parts and arranged those parts into sequences. Then I've stacked the sequences.
My training data is a numpy array with the shape [60000, 4, 1, 56, 14] where
Please note: One of the image-slices has the size 14x14 since I've cut the 28x28 image in four parts. I get a 56 in the shape, because I've created 4 different sequences and stacked them along this axis.
Here is my code so far:
nb_filters=32
kernel_size=(3,3)
pool_size=(2,2)
nb_classes=10
batch_size=64
model=Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode="valid", input_shape=[1, 56,14]))
model.add(Activation("relu"))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Reshape((56*14,)))
model.add(Dropout(0.25))
model.add(LSTM(5))
model.add(Dense(50))
model.add(Dense(nb_classes))
model.add(Activation("softmax"))
When run, the Reshape layer raises a ValueError:
ValueError: total size of new array must be unchanged
I also tried to pass the number of timesteps to the Reshape layer: model.add(Reshape((4, 56*14)))
But that doesn't solve the problem either.
What is the correct dimension to give to the Reshape layer ?
Is a Reshape layer the correct solution at all ?
I've posted the same question on stackoverflow.
You can view the output shapes of your model's layers with model.summary(). The Reshape layer must conserve the number of elements in a tensor. The input to an LSTM layer must have samples of shape (nb_timesteps, nb_features).
Oh, I completely forgot that the convolutions and the pooling would change the dimensions.
Thanks for the tip with model.summary().
I printed the model.summary() just before adding the reshape layer and just after.
The pooling layer before the reshape provides this output shape:
(None, 32, 26, 5)
So I changed the reshape layer to model.add(Reshape((32*26*5,)))
model.summary() shows the output of the reshape layer as:
(None, 4160)
There is no time_step dimension, I expected the model to deal with the time_steps under the hood.
But the LSTM complains:
Exception: Input 0 is incompatible with layer lstm_5: expected ndim=3, found ndim=2
At least the error in the reshape layer is gone. Thank you very much.
How do I account for the timestep dimension ? Do I pass it through the entire network, adding (4, ...) to the input dimension of the convolution, ... ?
If I understand you correctly, you want to feed the output of the Convolutional layer, with no time sequence information, to an LSTM layer. The way to do this is to divide it into time steps, either in batches or one element at a time. Since your feature map has a total dimensionality of 32*26*5=4160, you could do this, for example, by treating every element of it as a time step in a sequence. To do this, you should reshape it with Reshape((4160,1)). To sequence N elements at a time, use Reshape((4160/N, N)), where N is an integer divisor of 4160.
Either:
@kgrm I've got a sequence ready. The current dimension after the reshape layer: 4160, is the result of applying the various convolutions, etc on one of the images in the sequence.
I experimented with LSTMs and Dense layers and found that you can order them basically at random. Therefore I thought Keras would look for any LSTM in the network and keep in mind that the input should have an additional dimension for the timesteps. I thought this would be transparent for the rest of the network.
@carlthome TimeDistributed looks interesting. I'll take a look at it.
So, if you've already got a sequence of images, and you want to apply convolutions to each image, then feed an LSTM with the sequence, reshape is not the correct solution ?
Thank you both very much for the quick and friendly help.
Amazing, this seems to do the trick.
I wrapped everything up to the LSTM in the TimeDistributed layer and provided the number of time_steps as an additional input dimension.
This runs without problems.
Am I doing this the right way ?
nb_filters=32
kernel_size=(3,3)
pool_size=(2,2)
nb_classes=10
batch_size=64
model=Sequential()
model.add(TimeDistributed(
Convolution2D(
nb_filters, kernel_size[0], kernel_size[1], border_mode="valid"), input_shape=[4, 1, 56,14]))
model.add(TimeDistributed(Activation("relu")))
model.add(TimeDistributed(Convolution2D(nb_filters, kernel_size[0], kernel_size[1])))
model.add(TimeDistributed(Activation("relu")))
model.add(TimeDistributed(MaxPooling2D(pool_size=pool_size)))
model.add(TimeDistributed(Flatten()))
model.add(TimeDistributed(Dropout(0.25)))
model.add(LSTM(5))
model.add(Dense(50))
model.add(Dense(nb_classes))
model.add(Activation("softmax"))
Reshaping could be a correct solution, but only if you want to apply convolution + pooling across your entire sequence at once instead of per every "video frame".
It seems to learn nicely. I'm at 40% accuracy and the first epoch isn't even over.
Thank you both very much.
If you'd like to answer this on stackoverflow, too, I'll accept it.
Otherwise, I would write the answer myself.
52% accuracy
Hi,
I wrote this this code. But I got the following error in training step:
Exception: Error when checking model input: expected timedistributed_input_32 to have 5 dimensions, but got array with shape (60000, 56, 14, 1)
Can you help me how to resolve it? And how you can train your structure?
from future import print_function
import numpy as np
np.random.seed(1337) # for reproducibility
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten,Input, Convolution2D, MaxPooling2D, UpSampling2D, LSTM
from keras.utils import np_utils
from keras import backend as K
from keras.utils import np_utils
batch_size = 128
nb_classes = 10
nb_epoch = 10
img_rows, img_cols = 56, 14
nb_filters = 32
pool_size = (2, 2)
kernel_size = (3, 3)
(X_train, y_train), (X_test, y_test) = mnist.load_data()
if K.image_dim_ordering() == 'th':
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols)
input_shape = (4, 1, img_rows, img_cols)
else:
X_train = X_train.reshape(X_train.shape[0],img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape[0],img_rows, img_cols, 1)
input_shape = (4, img_rows, img_cols, 1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
Y_trainB = np_utils.to_categorical(y_train, nb_classes)
Y_testB = np_utils.to_categorical(y_test, nb_classes)
from keras.layers import TimeDistributed
model = Sequential()
model.add(TimeDistributed(Convolution2D(32, 3, 3,
border_mode='valid'),input_shape=input_shape))
model.add(Activation('relu'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(Dropout(0.25))
print (model.output_shape)
model.add(TimeDistributed(Convolution2D(32, 3, 3, border_mode='valid')))
model.add(Activation('relu'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(Dropout(0.25))
model.add(TimeDistributed(Flatten()))
print (model.output_shape)
model.add(LSTM(output_dim=64, return_sequences=True))
print (model.output_shape)
model.add(Dropout(0.5))
model.add(TimeDistributed(Dense(2)))
print (model.output_shape)
model.add(Activation('sigmoid'))
print (model.output_shape)
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(X_train, Y_trainB, batch_size=batch_size, nb_epoch=nb_epoch,
validation_data=(X_test, Y_testB))
Did you aldready find the solution to the problem with 'expected timedistributed_input_32 to have 5 dimensions, but got array with shape (60000, 56, 14, 1)'?
I am facing the same issue.Thanks
@lhk hello i followed #4172 #421 and tries every possible way to integrate CNN (VGG16) with LSTM in keras, but i am continuously getting error
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
from keras.models import Sequential
from keras.layers import Flatten, LSTM, Dense, TimeDistributed, InputLayer
from keras.layers.core import Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D
import numpy as np
#%% Image Generator for datasets
datagen = ImageDataGenerator(
rotation_range = 0,
width_shift_range = 0,
height_shift_range = 0,
shear_range = 0,
zoom_range = 0,
rescale = None,
horizontal_flip = False,
fill_mode = 'nearest')
train_generator = datagen.flow_from_directory(
'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Train',
target_size = (224,224),
batch_size = 1,
class_mode = 'categorical')
test_generator = datagen.flow_from_directory(
'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Test',
target_size = (224,224),
batch_size = 1,
class_mode = 'categorical')
#%% Convolutional Model (Basically Its VGG) + LSTM model (TimeDistributed)
model = Sequential()
model.add(InputLayer(input_shape=(5, 224, 224, 3)))
model.add(TimeDistributed(Convolution2D(64, (3, 3))))
model.add(TimeDistributed(MaxPooling2D((2,2), strides=(2,2))))
model.add(LSTM(10))
model.add(Dense(3))
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])
model.fit_generator(train_generator, epochs=1, steps_per_epoch=len(train_generator.filenames))
i am getting error that (here 5 is time step, 224,224 is image dimension and 3 is channel )
Error when checking input: expected input_2 to have 5 dimensions, but got array with shape (5, 224, 224, 3)
but when i supply samples also
model.add(InputLayer(input_shape=(1, 5, 224, 224, 3)))
i get this error
number of input channels does not match corresponding dimension of filter, 224 != 3
Hi All,
Is there any way to provide output from Conv2d layer as input for LSTM layer?
Please let me know.
Thanks!!
Yes you can be removing the Last Layer pr convo and Feed the Output of the
conv Layer as Lstm Input .
On Wednesday, December 19, 2018, amay1212 notifications@github.com wrote:
Hi All,
Is there any way to provide output from Conv2d layer as input for LSTM
layer?
Please let me know.Thanks!!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/keras-team/keras/issues/4172#issuecomment-448763999,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AfORNeYBm8J-mCdV02Lio-2jSPKRqlPzks5u6rtcgaJpZM4Ke3FG
.
--
Sent from my iPhone
Hi All,
Im getting this error
env (4, 10, 10)
Traceback (most recent call last):
File "train.py", line 140, in
main()
File "train.py", line 123, in main
model = create_dqn_model(env, num_last_frames=4)
File "train.py", line 107, in create_dqn_model
model.summary()
File "/anaconda3/lib/python3.6/site-packages/keras/engine/network.py", line 1247, in summary
'This model has never been called, thus its weights '
ValueError: This model has never been called, thus its weights have not yet been created, so no summary can be displayed. Build the model first (e.g. by calling it on some test data).
Here is my code...
model = Sequential()
print('env' , (num_last_frames, ) + env.observation_shape)
model.add(TimeDistributed(Conv2D(
16,
kernel_size=(3, 3),
strides=(1, 1),
data_format='channels_first',
input_shape= (num_last_frames, ) + env.observation_shape
)))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(Conv2D(
32,
kernel_size=(3, 3),
strides=(1, 1),
data_format='channels_first'
)))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(Flatten()))
model.add(Dropout(0.25))
model.add(LSTM(units = 200 , return_sequences = True))
model.add(Dropout(0.25))
model.add(LSTM(units = 200 , return_sequences = True))
model.add(Dropout(0.25))
model.add(LSTM(units = 200))
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dense(env.num_actions))
model.summary()
model.compile(RMSprop(), 'MSE')
Please let me know how can i fix this.
Thanks in adv!!
@lhk Hello can u please explain how should I interpret the input size in your above code
model.add(TimeDistributed(
Convolution2D(
nb_filters, kernel_size[0], kernel_size[1], border_mode="valid"), input_shape=[4, 1, 56,14]))
please explain. my input images are of size 64*128 how can I pass these size of images in the network.
@lhk hello i followed #4172 #421 and tries every possible way to integrate CNN (VGG16) with LSTM in keras, but i am continuously getting error
_below is my code_
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img from keras.applications.vgg16 import VGG16 from keras.preprocessing import image from keras.applications.vgg16 import preprocess_input from keras.models import Sequential from keras.layers import Flatten, LSTM, Dense, TimeDistributed, InputLayer from keras.layers.core import Dropout from keras.layers.convolutional import Convolution2D, MaxPooling2D import numpy as np #%% Image Generator for datasets datagen = ImageDataGenerator( rotation_range = 0, width_shift_range = 0, height_shift_range = 0, shear_range = 0, zoom_range = 0, rescale = None, horizontal_flip = False, fill_mode = 'nearest') train_generator = datagen.flow_from_directory( 'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Train', target_size = (224,224), batch_size = 1, class_mode = 'categorical') test_generator = datagen.flow_from_directory( 'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Test', target_size = (224,224), batch_size = 1, class_mode = 'categorical') #%% Convolutional Model (Basically Its VGG) + LSTM model (TimeDistributed) model = Sequential() model.add(InputLayer(input_shape=(5, 224, 224, 3))) model.add(TimeDistributed(Convolution2D(64, (3, 3)))) model.add(TimeDistributed(MaxPooling2D((2,2), strides=(2,2)))) model.add(LSTM(10)) model.add(Dense(3)) model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy']) model.fit_generator(train_generator, epochs=1, steps_per_epoch=len(train_generator.filenames))i am getting error that (here 5 is time step, 224,224 is image dimension and 3 is channel )
Error when checking input: expected input_2 to have 5 dimensions, but got array with shape (5, 224, 224, 3)but when i supply samples also
model.add(InputLayer(input_shape=(1, 5, 224, 224, 3)))
i get this error
number of input channels does not match corresponding dimension of filter, 224 != 3
@usmanatif I am receiving the same problem, did you solve it?
Amazing, this seems to do the trick.
I wrapped everything up to the LSTM in the TimeDistributed layer and provided the number of time_steps as an additional input dimension.
This runs without problems.Am I doing this the right way ?
nb_filters=32 kernel_size=(3,3) pool_size=(2,2) nb_classes=10 batch_size=64 model=Sequential() model.add(TimeDistributed( Convolution2D( nb_filters, kernel_size[0], kernel_size[1], border_mode="valid"), input_shape=[4, 1, 56,14])) model.add(TimeDistributed(Activation("relu"))) model.add(TimeDistributed(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))) model.add(TimeDistributed(Activation("relu"))) model.add(TimeDistributed(MaxPooling2D(pool_size=pool_size))) model.add(TimeDistributed(Flatten())) model.add(TimeDistributed(Dropout(0.25))) model.add(LSTM(5)) model.add(Dense(50)) model.add(Dense(nb_classes)) model.add(Activation("softmax"))
How did u do that?
can u give me a pic of that
@lhk hello i followed #4172 #421 and tries every possible way to integrate CNN (VGG16) with LSTM in keras, but i am continuously getting error
_below is my code_
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img from keras.applications.vgg16 import VGG16 from keras.preprocessing import image from keras.applications.vgg16 import preprocess_input from keras.models import Sequential from keras.layers import Flatten, LSTM, Dense, TimeDistributed, InputLayer from keras.layers.core import Dropout from keras.layers.convolutional import Convolution2D, MaxPooling2D import numpy as np #%% Image Generator for datasets datagen = ImageDataGenerator( rotation_range = 0, width_shift_range = 0, height_shift_range = 0, shear_range = 0, zoom_range = 0, rescale = None, horizontal_flip = False, fill_mode = 'nearest') train_generator = datagen.flow_from_directory( 'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Train', target_size = (224,224), batch_size = 1, class_mode = 'categorical') test_generator = datagen.flow_from_directory( 'D:\SpyderDeepLearning\PersonReidentification\Dataset_Light\Test', target_size = (224,224), batch_size = 1, class_mode = 'categorical') #%% Convolutional Model (Basically Its VGG) + LSTM model (TimeDistributed) model = Sequential() model.add(InputLayer(input_shape=(5, 224, 224, 3))) model.add(TimeDistributed(Convolution2D(64, (3, 3)))) model.add(TimeDistributed(MaxPooling2D((2,2), strides=(2,2)))) model.add(LSTM(10)) model.add(Dense(3)) model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy']) model.fit_generator(train_generator, epochs=1, steps_per_epoch=len(train_generator.filenames))i am getting error that (here 5 is time step, 224,224 is image dimension and 3 is channel )
Error when checking input: expected input_2 to have 5 dimensions, but got array with shape (5, 224, 224, 3)
but when i supply samples also
model.add(InputLayer(input_shape=(1, 5, 224, 224, 3)))
i get this error
number of input channels does not match corresponding dimension of filter, 224 != 3@usmanatif I am receiving the same problem, did you solve it?
DID YOU GET THE SOLUTION??? I am also getting the same problem!!!!
Most helpful comment
If I understand you correctly, you want to feed the output of the Convolutional layer, with no time sequence information, to an LSTM layer. The way to do this is to divide it into time steps, either in batches or one element at a time. Since your feature map has a total dimensionality of
32*26*5=4160, you could do this, for example, by treating every element of it as a time step in a sequence. To do this, you should reshape it withReshape((4160,1)). To sequence N elements at a time, useReshape((4160/N, N)), whereNis an integer divisor of 4160.