Keras: Stateful LSTMs - error despite using "batch_input_shape"

Created on 22 Mar 2016 · 15Comments · Source: keras-team/keras

When i add 'stateful' to LSTM, I get following Exception: If a RNN is stateful, a complete input_shape must be provided (including batch size).
Based on other threads #1125 #1130 I am using the option of "batch_input_shape" yet i am getting the error.
I raised the same in forum https://groups.google.com/forum/#!topic/keras-users/nwB3ilYY4ZQ
but no response

you can find my complete code here:
https://github.com/anujgupta82/DeepNets/blob/master/LSTM/IMDB_Embedding_w2v_LSTM_3.ipynb

stale

Source

anujgupta82

👍2

Most helpful comment

Figured it out from topology.py - the error message is misleading. The "Input" function takes argument "batch_shape", not "batch_input_shape".

cmgladding on 2 Dec 2016

👍7

All 15 comments

batch_input_shape must be passed to the first layer of the network.

NasenSpray on 22 Mar 2016

👍2

Same problem here:
model = Sequential() model.add(GRU(100,activation='relu',stateful=True,return_sequences=True,batch_input_shape=(batch_size,X_train.shape[-2], X_train.shape[-1]))) ...

gibipara92 on 1 Apr 2016

What's the X_train.shape[0] you have? do all your batches have the same number of samples? that is a must when using stateful RNNs.

santi-pdp on 1 Apr 2016

X_train.shape[0] is the number of samples.

I have just tried using a batch_size that is a factor of the number of samples (so all batches have exactly the same number of samples) and it works, thanks!

A note that mentions this might be helpful in the documentation where talking about statefulness.
Edit: it's already there, my bad.

gibipara92 on 1 Apr 2016

Have a look at this : http://philipperemy.github.io/keras-stateful-lstm/

philipperemy on 31 Jul 2016

👍1

If batch_input_shape must be specified in the first layer of a stateful network, how is this done when using the functional API? The Input() layer will not allow it. I have tried everything I can think of but am still receiving this same exception ("complete input_shape must be provided (including batch size)"), even with batch size 1. I am trying to make an LRCN using TimeDistributed CNN layers, followed by a couple dense layers, followed by LSTM:

inputs = Input(shape=(1,3,227,227))

conv_1 = TimeDistributed(Convolution2D(96, 11, 11,subsample=(4,4),activation='relu',
                       name='conv_1'))(inputs)

conv_2 = TimeDistributed(MaxPooling2D((3, 3), strides=(2,2)))(conv_1)
conv_2 = TimeDistributed(LRN(name="convpool_1"))(conv_2)
conv_2 = TimeDistributed(ZeroPadding2D((2,2)))(conv_2)
conv_2 = TimeDistributed(Convolution2D(128,5,5,activation="relu",name="conv_2"))(conv_2)

...skipping similar conv layers...

dense_1 = TimeDistributed(MaxPooling2D((3, 3), strides=(2,2),name="convpool_5"))(conv_5)
dense_1 = TimeDistributed(Flatten(name="flatten"))(dense_1)
dense_1 = TimeDistributed(Dense(4096, activation='relu',name='dense_1'))(dense_1)
dense_2 = Dropout(0.5)(dense_1)
dense_2 = TimeDistributed(Dense(4096, activation='relu',name='dense_2'))(dense_2)

lstm_1 = Dropout(0.5)(dense_2)
lstm_1 = LSTM(100,
              batch_input_shape=(1,1,4096), #(batch size,timesteps,feature shape)
              return_sequences=False,
              stateful=True)(lstm_1)

dense_3 = Dense(6,name='dense_out')(lstm_1)
prediction = Activation("tanh",name="tanh")(dense_3)

(This is a regression problem; I am trying to predict six values at each time step based on an image sequence.)

The same network using the other API does not produce the same exception, but I am hoping to take advantage of the functional API, so I'd like to figure out what I'm doing wrong.

cmgladding on 1 Dec 2016

👍3

Figured it out from topology.py - the error message is misleading. The "Input" function takes argument "batch_shape", not "batch_input_shape".

cmgladding on 2 Dec 2016

👍7

@cmgladding
I tried "batch_shape", but is was not recognized by Keras. Don't know why, The issues persist for me no matter what key words I used.

Ryan-fireball on 23 Dec 2016

I also have this issue on functional model

wulabs on 14 Jan 2017

Here is my example for those who get stuck. Indeed, the error message is misleading. I had to change Input(shape=()) to Input(batch_shape=()) in order for it to work.

Error one:

frame_sequence = Input(shape=(TIME_STEPS, HEIGHT, WIDTH, CHANNELS))
...
net = TimeDistributed(self.vision_model)(frame_sequence)
net = LSTM(HIDDEN_UNITS, stateful=True, return_sequences=False)(net)

Correct version:

frame_sequence = Input(batch_shape=(BATCH_SIZE, TIME_STEPS, HEIGHT, WIDTH, CHANNELS))
...
net = TimeDistributed(self.vision_model)(frame_sequence)
net = LSTM(HIDDEN_UNITS, stateful=True, return_sequences=False)(net)

datlife on 7 Feb 2017

👍1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.

stale[bot] on 23 May 2017

Hi,
I am facing almost same issue for a shared convnet, then two features are concatenated and to be feed to a Dense layer and then into a LSTM. Model is like fig.1 https://arxiv.org/pdf/1705.06368.pdf

I am stucked in this issue for a long time, any help is much appricated. @farizrahman4u @dat-ai
`# First, define the vision modules
input_dim = (224, 224, 3)
image_input = Input(shape=input_dim)

vision_model = Conv2D(64, (3, 3), activation='relu', padding='same')(image_input)
vision_model = Conv2D(64, (3, 3), activation='relu')(vision_model)
vision_model = MaxPooling2D((2, 2))(vision_model)
vision_model = Conv2D(128, (3, 3), activation='relu', padding='same')(vision_model)
vision_model = Conv2D(128, (3, 3), activation='relu')(vision_model)
vision_model = MaxPooling2D((2, 2))(vision_model)
vision_model = Conv2D(256, (3, 3), activation='relu', padding='same')(vision_model)
vision_model= MaxPooling2D((2, 2))(vision_model)
out = Flatten()(vision_model)

model = Model(image_input,out)

digit_a = Input(shape=input_dim)
digit_b = Input(shape=input_dim)

The vision model will be shared, weights and all

out_a = model(digit_a)
out_b = model(digit_b)

concatenated = concatenate([out_a, out_b])
out = Dense(2048, activation='relu')(concatenated)

concat_model= Model([digit_a,digit_b],out)

batch size =32,num of unroll =2, I am not sure how to put multi input sequence as input

frame_sequence = Input(batch_shape=(32, 2,224,224,3))
unroll_feature = TimeDistributed(concat_model)(frame_sequence)`

And the error message I got

Using TensorFlow backend.
Traceback (most recent call last):
File "/home/rajat/Downloads/pycharm-2017.2.3/helpers/pydev/pydevd.py", line 1599, in
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/rajat/Downloads/pycharm-2017.2.3/helpers/pydev/pydevd.py", line 1026, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/rajat/Downloads/re3-tensorflow-master/keras_training/vision_model.py", line 52, in
unroll_feature = TimeDistributed(concat_model)(frame_sequence)
File "/home/rajat/.local/lib/python2.7/site-packages/keras/engine/topology.py", line 602, in __call__
output = self.call(inputs, *kwargs)
File "/home/rajat/.local/lib/python2.7/site-packages/keras/layers/wrappers.py", line 188, in call
unroll=False)
File "/home/rajat/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2467, in rnn
outputs, _ = step_function(inputs[0], initial_states + constants)
File "/home/rajat/.local/lib/python2.7/site-packages/keras/layers/wrappers.py", line 179, in step
output = self.layer.call(x, *kwargs)
File "/home/rajat/.local/lib/python2.7/site-packages/keras/engine/topology.py", line 2058, in call
output_tensors, _, _ = self.run_internal_graph(inputs, masks)
File "/home/rajat/.local/lib/python2.7/site-packages/keras/engine/topology.py", line 2262, in run_internal_graph
assert str(id(x)) in tensor_map, 'Could not compute output ' + str(x)
AssertionError: Could not compute output Tensor("dense_1/Relu:0", shape=(?, 2048), dtype=float32)

rajatkoner08 on 6 Dec 2017

Hi @rajatkoner08,
Please let me know if you were able to solve this problem and how to fix it. Thanks in advance. Madhu

madhuhegde on 24 May 2019

When i add 'stateful' to LSTM, I get following Exception: If a RNN is stateful, a complete input_shape must be provided (including batch size).
Based on other threads #1125 #1130 I am using the option of "batch_input_shape" yet i am getting the error.
I raised the same in forum https://groups.google.com/forum/#!topic/keras-users/nwB3ilYY4ZQ
but no response

you can find my complete code here:
https://github.com/anujgupta82/DeepNets/blob/master/LSTM/IMDB_Embedding_w2v_LSTM_3.ipynb

Needs to be built like so:

self.lstm_custom_1 = keras.layers.LSTM(128,batch_input_shape=batch_input_shape, return_sequences=False, stateful=True)

self.lstm_custom_1.build(batch_input_shape)

mmehedin on 3 Jul 2019

Using Keras for R with a Functional API I am observing a similar problem which I can't resolve referring to the advice given above, since the cases above refer to Keras for Python and are (for me) not easily transferred to Keras for R.

Since this thread has been closed long before, I have raised this topic anew under issue #13262 - hopefully there will be replies w.r.t. to Keras for R.