Keras: How to connect a dense to a LSTM

Created on 29 Oct 2015 · 6Comments · Source: keras-team/keras

Hey there,

I've been trying to connect a Dense layer to the LSTM which should be after it. The code below shows what I mean more concretely.

qrnn = Sequential()
qrnn.add(Dense(512, input_dim=X.shape[1]))
qrnn.add(Dropout(0.5))

qrnn.add(LSTM(512, return_sequences=True))
qrnn.add(Dropout(0.2))
qrnn.add(LSTM(512, return_sequences=False))

But I get this error
Incompatible shapes: layer expected input with ndim=3 but previous layer has output_shape (None, 512)

stale

Source

AntreasAntoniou

Most helpful comment

LSTMs expect a sequence of vectors of shape X.shape[1]. You need a TimeDistributedDense in front there instead of Dense. Or an Embedding layer if your inputs are sequences of categorical integers.

sergeyf on 30 Oct 2015

👍7

All 6 comments

sergeyf on 30 Oct 2015

👍7

I just tried adding TimeDistributedDense (Embedding works in other scenarios that I've tested but this one doesn't have categorical integers). I still get the exact same error. X is a large matrix composed of some 1 000 000 samples of 119 sequences each. Basically numbers e.g 1.1, 1.2, 1.3 etc. And I want to predict the next one in the sequence as output. Could you give an example use case for the suggested layer type?

AntreasAntoniou on 30 Oct 2015

The following compiles for me:

# input shape: (nb_samples, timesteps, 10)
model = Sequential()
model.add(TimeDistributedDense(10,input_dim=10)) # output shape: (nb_samples, timesteps, 10)
model.add(LSTM(10, return_sequences=True)) # output shape: (nb_samples, timesteps, 10)
model.add(TimeDistributedDense(5)) # output shape: (nb_samples, timesteps, 5)
model.add(LSTM(10, return_sequences=False)) # output shape: (nb_samples, 10))
optimizer = RMSprop(lr=0.001,clipnorm=10) 
model.compile(optimizer=optimizer, loss='mse')

sergeyf on 30 Oct 2015

Thanks for your time. I will give it a try and let you know

AntreasAntoniou on 30 Oct 2015

Good luck!

sergeyf on 30 Oct 2015

I write this transform layer to create input for LSTM or unroll LSTM output for Dense layer as well.

There only 1 issue, you must take into account that _nb_samples of input = nb_samples of output_, i.e. if you create a sequence of 20 length, then your nb_samples of output is divided by 20.

I am not so clear how keras infer the nb_samples of output, but it would be great if they allow nb_samples changing after feed input into the network.

p/s: This layer can acts as first Input Layer and do some reshaping your input.

class Transform(Layer):

    '''This layer can be use as ``Input Layer`` or Shape Transform layer
        Example:
        input_shape: (128,)
        transform_shape: (5,128) or (5,None)
        => (100,128) -> (20,5,128)

        input_shape: (5, 128)
        transform_shape: (128,)
        => (20,5,128) -> (100,128) <=> Flatten

        * dimshuffle(0,2,1) != reshape(-1,2,1)
    '''

    def __init__(self, transform_shape=None, input_shape=None, **kwargs):
        if not hasattr(transform_shape, '__len__'):
            transform_shape = (transform_shape,)
        self.transform_shape = transform_shape

        if input_shape:
            if not hasattr(input_shape, '__len__'):
                self.input_shape = (input_shape,)
            self.input_ndim = len(input_shape) + 1
            kwargs['input_shape'] = input_shape
        super(Transform, self).__init__(**kwargs)

    @property
    def output_shape(self):
        nb_samples = None
        if (hasattr(self.input_shape, '__len__') and None not in self.input_shape) or \
            not hasattr(self.input_shape, '__len__'):
            nb_samples = T.prod(self.input_shape) / T.prod(self.transform_shape)
        if len(self.transform_shape) == 2 and self.transform_shape[1] is None:
            self.transform_shape = (self.transform_shape[0], self.input_shape[1])
        return (nb_samples,) + self.transform_shape

    def get_output(self, train=False):
        X = self.get_input(train)
        # convert 2D data to 3D
        if len(self.transform_shape) == 2 and self.transform_shape[1] is None:
            self.transform_shape = (self.transform_shape[0], X.shape[1])
        return T.reshape(X, (-1,) + self.transform_shape)

    def get_config(self):
        config = {"name": self.__class__.__name__,
            "transform_shape": self.transform_shape}
        base_config = super(Transform, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))