Keras: Help: 'Wrong number of dimensions: expected 3, got 2 with shape (32L, 60L).' in LSTM model

Created on 4 Feb 2016  路  10Comments  路  Source: keras-team/keras

Hey everyone,

I'm trying to use custom data on the LSTM model, but it keeps giving shape errors. After reading some other issues along the same lines, I even tried reshaping the input data to size (nb_inputs, timestamps, 1) which looks approximately like (4200, 60, 1), but that returns an error that says a shape of (None, 4200, 60, 1) is no good. Any thoughts?

maxlen = 60
batch_size = 32

print('Loading data...')
(X_train, y_train), (X_test, y_test) = t.LoadData()
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

print('Build model...')
model = Sequential()
model.add(LSTM(128, input_shape=X_train.shape))

model.compile(loss='binary_crossentropy',
              optimizer='sgd',
              class_mode="categorical")

print("Train...")
model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=3,
          validation_data=(X_test, y_test), show_accuracy=True)
score, acc = model.evaluate(X_test, y_test,
                            batch_size=batch_size,
                            show_accuracy=True)
print('Test score:', score)
print('Test accuracy:', acc)

Output:

Using Theano backend.
Loading data...
4130 train sequences
1016 test sequences
X_train shape: (4130L, 60L)
X_test shape: (1016L, 60L)
Build model...
Train...
Train on 4130 samples, validate on 1016 samples
Epoch 1/3
Traceback (most recent call last):
File "main.py", line 52, in
validation_data=(X_test, y_test), show_accuracy=True)
File "C:\Miniconda2\lib\site-packageskeras\models.py", line 507, in fit
shuffle=shuffle, metrics=metrics)
File "C:\Miniconda2\lib\site-packageskeras\models.py", line 226, in _fit
outs = f(ins_batch)
File "C:\Miniconda2\lib\site-packageskeras\backendtheano_backend.py", line 357, in __call__
return self.function(*inputs)
File "C:\Miniconda2\lib\site-packagestheano\compile\function_module.py", line 513, in call
allow_downcast=s.allow_downcast)
File "C:\Miniconda2\lib\site-packagestheanotensortype.py", line 169, in filter
data.shape))
TypeError: ('Bad input argument to theano function with name "C:\Miniconda2\lib\site-packageskeras\backendtheano_backend.py:354" at index 0(0-based)', 'Wrong number of dimensions: expected 3, got 2 with shape (32L, 60L).')

stale

Most helpful comment

So X.shape is (samples, timesteps, dimension), but the model architecture doesn't care about many training examples (samples) you have. Once you've built the model you can feed it a hundred million examples, doesn't matter. So you don't pass that as a parameter when you build your model. So X.shape[1:] is just (timesteps, samples) the two dimensions that matter

Incidentally if you're on a Theano backend you _also_ don't need to specify the number of timesteps, but you need to pass "None" for that dimension, then.So instead you would pass in (None, X.shape[2])

As to why your score is negative: there's still something a bit fishy with your model. Your LSTM has 128 output dimensions and then you're evaluating binary cross-entropy on that? Is your y target also 128 dimensional? If it's not, you probably meant to put a Dense(1) layer, bringing your output down to a single output that is compatible with y. Also if you're using a cross-entropy objective you want your output to be a probability distribution so you probably meant to put some sort of activation on top of it to normalize its output.

Or else you probably meant to use a different objective function.

Without knowing more about your data (for instance the size of your y matrix) it's hard for me to help further.

All 10 comments

I even tried reshaping the input data to size (nb_inputs, timestamps, 1)

You will need to do that, but then you shouldn't put X_train.shape as the input_shape parameter of your LSTM. It doesn't care about the total number of training points for the model architecture. You should be able to instead pass in input_shape=X_train.shape[1:].

Thank you! Reshaping the arrays and adding .shape[1:] lets it run. May I ask why the input shape needs to be .shape[1:]?

Also, (off topic) the output looks like:

Using Theano backend.
Using gpu device 0: GeForce GTX 970
Loading data...
4262 train sequences
1083 test sequences
X_train shape: (4262L, 60L, 1L)
X_test shape: (1083L, 60L, 1L)
Build model...
Train...
Train on 4262 samples, validate on 1083 samples
Epoch 1/3
4262/4262 [==============================] - 48s - loss: -28.0929 - acc: 1.0000 - val_loss: -64.3520 - val_acc: 1.0000
Epoch 2/3
4262/4262 [==============================] - 48s - loss: -64.6251 - acc: 1.0000 - val_loss: -64.3520 - val_acc: 1.0000
Epoch 3/3
4262/4262 [==============================] - 48s - loss: -64.6251 - acc: 1.0000 - val_loss: -64.3520 - val_acc: 1.0000
1083/1083 [==============================] - 1s
Test score: -64.3520374245
Test accuracy: 1.0

Is there a reason the loss is negative?

So X.shape is (samples, timesteps, dimension), but the model architecture doesn't care about many training examples (samples) you have. Once you've built the model you can feed it a hundred million examples, doesn't matter. So you don't pass that as a parameter when you build your model. So X.shape[1:] is just (timesteps, samples) the two dimensions that matter

Incidentally if you're on a Theano backend you _also_ don't need to specify the number of timesteps, but you need to pass "None" for that dimension, then.So instead you would pass in (None, X.shape[2])

As to why your score is negative: there's still something a bit fishy with your model. Your LSTM has 128 output dimensions and then you're evaluating binary cross-entropy on that? Is your y target also 128 dimensional? If it's not, you probably meant to put a Dense(1) layer, bringing your output down to a single output that is compatible with y. Also if you're using a cross-entropy objective you want your output to be a probability distribution so you probably meant to put some sort of activation on top of it to normalize its output.

Or else you probably meant to use a different objective function.

Without knowing more about your data (for instance the size of your y matrix) it's hard for me to help further.

So X.shape[1:] is just (timesteps, samples) the two dimensions that matter

I'm guessing you meant (timesteps, dimension)?

That makes sense, though. Thank you for the information. As for the output data, yes, a _binary__crossentropy loss function doesn't make much sense, considering the data look like:

[
[5.45, 5.42, ..., 5.26],
[5.25, 5.28, ..., 5.30],
...
[5.12, 5.15, ..., 5.65]
],
[5.13, 5.17, ..., 5.05]

Where the first list contains sequences of input (which are themselves lists), and the output is a single float value.

I've changed the model:

batch_size = 32

print('Loading data...')
(X_train, y_train), (X_test, y_test) = t.LoadData()
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

X_train = np.reshape(X_train, X_train.shape + (1,))
X_test = np.reshape(X_test, X_test.shape + (1,))

print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

print('Build model...')
model = Sequential()
model.add(LSTM(1, input_shape=X_train.shape[1:]))

model.compile(loss='mse',
              optimizer='sgd',
              class_mode="categorical")

print("Train...")
model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=3,
          validation_data=(X_test, y_test), show_accuracy=True)
score, acc = model.evaluate(X_test, y_test,
                            batch_size=batch_size,
                            show_accuracy=True)
print('Test score:', score)
print('Test accuracy:', acc)

And it now produces output closer to the desired result:

Using Theano backend.
Loading data...
4109 train sequences
998 test sequences
X_train shape: (4109L, 60L, 1L)
X_test shape: (998L, 60L, 1L)
Build model...
Train...
Train on 4109 samples, validate on 998 samples
Epoch 1/3
4109/4109 [==============================] - 3s - loss: 26.1860 - acc: 1.0000 - val_loss: 26.4226 - val_acc: 1.0000
Epoch 2/3
4109/4109 [==============================] - 3s - loss: 26.1860 - acc: 1.0000 - val_loss: 26.4226 - val_acc: 1.0000
Epoch 3/3
4109/4109 [==============================] - 3s - loss: 26.1860 - acc: 1.0000 - val_loss: 26.4226 - val_acc: 1.0000
998/998 [==============================] - 0s
Test score: 26.4226496511
Test accuracy: 1.0

I'll keep plugging away. :)

For most applications you would probably want more than 1 hidden state on your LSTM! You can put a Dense layer (or TimeDistributedDense) with an output dimension of 1 to project the hidden state down to 1 dimension on output, while still retaining more than 1 dimension of state. So something like:

model.add(LSTM(128, input_shape=X_train.shape[1:]))
model.add(Dense(1))
model.add(Activation('sigmoid'))

from pandas import DataFrame
from pandas import Series
from pandas import concat
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from math import sqrt
from matplotlib import pyplot
import numpy as np



# frame a sequence as a supervised learning problem
def timeseries_to_supervised(data, lag=1):
    df = DataFrame(data)
    columns = [df.shift(i) for i in range(1, lag+1)]
    columns.append(df)
    df = concat(columns, axis=1)
    df.fillna(0, inplace=True)
    return df

# create a differenced series
def difference(dataset, interval=1):
    diff = list()
    for i in range(interval, len(dataset)):
        value = dataset[i] - dataset[i - interval]
        diff.append(value)
    return Series(diff)

# invert differenced value
def inverse_difference(history, yhat, interval=1):
    return yhat + history[-interval]

# scale train and test data to [-1, 1]
def scale(train):
    # fit scaler
    scaler = MinMaxScaler(feature_range=(-1, 1))
    scaler = scaler.fit(train)
    # transform train
    train = train.reshape(train.shape[0], train.shape[1])
    train_scaled = scaler.transform(train)

    return scaler, train_scaled

# inverse scaling for a forecasted value
def invert_scale(scaler, X, value):
    new_row = [x for x in X] + [value]
    array = np.array(new_row)
    array = array.reshape(1, len(array))
    inverted = scaler.inverse_transform(array)
    return inverted[0, -1]

def generate_features(x, forecast, window):
    """ Concatenates a time series vector x with forecasts from
        the iterated forecasting strategy.

    Arguments:
    ----------
        x:        Numpy array of length T containing the time series.
        forecast: Scalar containing forecast for time T + 1.
        window:   Autoregressive order of the time series model.
    """
    augmented_time_series = np.hstack((x, forecast))

    return augmented_time_series[-window:].reshape(1, -1)

    # fit an LSTM network to training data
def fit_lstm(train, batch_size, nb_epoch, neurons):
    X, y = train[:, 0:-1], train[:, -1]
    X = X.reshape(X.shape[0], 1, X.shape[1])
    model = Sequential()
    model.add(LSTM(neurons, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam')
    for i in range(nb_epoch):
        model.fit(X, y, epochs=1, batch_size=batch_size, verbose=0, shuffle=False)
        model.reset_states()
    return model

def iterative_forecast(model, x, window, H):
    """ Implements iterative forecasting strategy

    Arguments:
    ----------
        model: scikit-learn model that implements a predict() method
               and is trained on some data x.
        x:     Numpy array containing the time series.
        h:     number of time periods needed for the h-step ahead
               forecast
    """
    forecast = np.zeros(H)    
    forecast[0] = model.predict(x.reshape(1, -1))

    for h in range(1, H):
        features = generate_features(x, forecast[:h], window)

        forecast[h] = model.predict(features)

    return forecast


# load dataset
series = read_csv('shampoosales.csv', header=0, index_col=0, squeeze=True)



# transform data to be stationary
raw_values = series.values
diff_values = difference(raw_values, 1)

# transform data to be supervised learning
supervised = timeseries_to_supervised(diff_values, 1)
supervised_values = supervised.values


train = supervised_values[0:-12]
test = supervised_values[-12:]

# transform the scale of the data
scaler, train_scaled = scale(train)
# fit the model
lstm_model = fit_lstm(train_scaled, 1, 3000, 4)

yhat = iterative_forecast(lstm_model, train, 1, 10)
predictions = list()
predictions.append(yhat)

i am trying to discover an algorithm for iterative forecast using LSTM. seems to be something wrong with the code. would you be kind enough to help?

error that i am getting

'Error when checking : expected lstm_2_input to have 3 dimensions, but got array with shape (1, 46)'

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

Hi,

I have an input data with three variables / dimensions. with 4080 total samples. I am trying the below RNN script but getting the error.
Any help ?

model=Sequential()
model.add(GRU(3,return_sequences=True,input_shape=(4080,3)))
model.add(Dense(1))
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
model.fit(x_train,dummy_y_train,nb_epoch=20,batch_size=20,verbose=1)

ERROR: Error when checking input: expected gru_1_input to have 3 dimensions, but got array with shape (4080, 3)

@wxs Don't you think there is something fishy in @DanHenry4 work as he is getting the same loss after each epoch and accuracy is always 1 (100%), which is near to impossible in most of machine learning predictions and specially in stock price prediction. I am also getting the loss 0.0 , therefore i am confused, may be i did something wrong.

Please reply me on this, I am using LSTM for the first time and I am confused by seeing the accuracy that may be I am doing something wrong.

Your guidance will be appreciated. Thanks,

if the matrix size is different in the test and the data on which model was trained than what can I do? Keras in r. my results are really poor just because I've to add dummy columns to match matrix size.

Was this page helpful?
0 / 5 - 0 ratings