Keras: LSTM is always predicting constant

Created on 15 Feb 2016 · 5Comments · Source: keras-team/keras

I want to predict sequences of number vectors based on the previous ones. I have five sequences. I considered the length of the history 100. I transformed the data to following format:
As an input X I have array of n matrices, each with 100 rows and 5 columns (technically, X is a tensor with dimensions n x 100 x 5). The target y will be matrix n x 5 - for each input X_i (matrix 100 x 5) I want one corresponding row of y (with just two elements).
So my input data (X) is a numpy array n x 100 x 5 and output (y) is n x 5 . My model is as follows:

in_out_neurons = 5
hidden_neurons = 20
model = Sequential()
model.add(LSTM(hidden_neurons, return_sequences=True, forget_bias_init = 'one', inner_activation='hard_sigmoid', input_shape=(100,5)))
model.add(Activation("sigmoid"))
model.add(Dropout(0.2))
model.add(LSTM(hidden_neurons, return_sequences=False, forget_bias_init = 'one', inner_activation='hard_sigmoid'))
model.add(Activation("sigmoid"))
model.add(Dropout(0.2))
model.add(Dense(in_out_neurons))
model.add(Activation("linear"))
model.compile(loss="mean_squared_error", optimizer="rmsprop")
model.fit(X_train, y_train, batch_size=450, nb_epoch=20, validation_split=0.05)
predicted = model.predict(X_test)

The problem is that it always predicts a constant value for each sequence for all times. But when I use the input of the following link with two sequences, it can predict very well:

http://danielhnyk.cz/predicting-sequences-vectors-keras-using-rnn-lstm/

I have changed number of epochs, batch_size but nothing changed! Can anybody help me find the problem?

Source

mininaNik

Most helpful comment

I solved my problem! Just to inform others who have the same problem: I scaled (centered) my input data.

mininaNik on 16 Feb 2016

👍14 😄6 ❤3 🎉2

All 5 comments

Good question, thank you
On Feb 15, 2016 03:18, "mininaNik" [email protected] wrote:

I want to predict sequences of number vectors based on the previous ones.
I have five sequences. I considered the length of the history 100. I
transformed the data to following format:
As an input X I have array of n matrices, each with 100 rows and 5 columns
(technically, X is a tensor with dimensions n x 100 x 5). The target y will
be matrix n x 5 - for each input X_i (matrix 100 x 5) I want one
corresponding row of y (with just two elements).
So my input data (X) is a numpy array n x 100 x 5 and output (y) is n x 5
. My model is as follows:

in_out_neurons = 5
hidden_neurons = 20
model = Sequential()
model.add(LSTM(hidden_neurons, return_sequences=True, forget_bias_init =
'one', inner_activation='hard_sigmoid', input_shape=(100,5)))
model.add(Activation("sigmoid"))
model.add(Dropout(0.2))
model.add(LSTM(hidden_neurons, return_sequences=False, forget_bias_init =
'one', inner_activation='hard_sigmoid'))
model.add(Activation("sigmoid"))
model.add(Dropout(0.2))
model.add(Dense(in_out_neurons))
model.add(Activation("linear"))
model.compile(loss="mean_squared_error", optimizer="rmsprop")
model.fit(X_train, y_train, batch_size=450, nb_epoch=20,
validation_split=0.05)
predicted = model.predict(X_test)

The problem is that it always predicts a constant value for each sequence
for all times. But we I use the input of the following link with two
sequence, it can predict very well:

http://danielhnyk.cz/predicting-sequences-vectors-keras-using-rnn-lstm/

I have changed number of epochs, batch_size but nothing changed! Can
anybody help me find the problem?

—
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/1727.

Sandy4321 on 15 Feb 2016

I solved my problem! Just to inform others who have the same problem: I scaled (centered) my input data.

mininaNik on 16 Feb 2016

👍14 😄6 ❤3 🎉2

@mininaNik How did you center the data? I have a data say in the dimension of 1000x2, I need to produce the training sample, e.g., every 30 data predict the 31th data. So my training data dimension will become n x (30 x 2).

Did you center the data of 1000x2 so that it has zero mean, or for every 30x2 pairs, center the data, so every 30x2 (total of n) has zero mean?

coolzai on 16 Jul 2016

@coolzai Yes I centered the data to have 0 mean. First I centered the whole data and then I sliced and reshaped the data for training and testing.

mininaNik on 19 Jul 2016

👍1

@mininaNik If you rescale the whole dataset before train_test_split, there is a chance of introducing look-ahead bias. I think you should split the data first, then use the scaler you trained on the training dataset to transform the test dataset.