Keras: How to implement a RNN with 4-D input

Created on 6 May 2016 · 5Comments · Source: keras-team/keras

Hi,
I'm currently working on a weird paragraph summarization task, which requires a 4-D tensor input with shape (nb_samples, n_sentences, timesteps, input_dim). That is, each example is a paragraph with n_sentences sentences. The story here is I need to run an RNN for each sentence and then use the the last state of the sentence as a 'sentence embedding', then I have to do some pooling over the n_sentences ( for example, averaging the sentence embeddings) as the final embedding for each paragraph.

Looks like the Keras RNN only supports the 3-D input, which means I have two ways to go:

Implement a 4-D RNN that extends 3D-RNN, where I need to call a 3-RNN for each example.
Reshape the tensor to a 3-D tensor with shape (nb_samples, n_sentences*timesteps, input_dim). And still use the 3-D RNN. But I need to find some ways that reset the RNN every timesteps steps. I think this is a use case that on the opposite of stateful, where I have to reset within "sentence".

Any suggestions? I think the latter one probably easier to implement but It is slow as the RNN should explicitly go over a "sentence" with length n_sentences*timesteps without any parallelism.

stale

Source

wddabc

👍3

Most helpful comment

It looks like you're looking for TimeDistributed(LSTM(...)).

fchollet on 7 May 2016

👍5

All 5 comments

It looks like you're looking for TimeDistributed(LSTM(...)).

fchollet on 7 May 2016

👍5

I had to write one for a similar situation (I actually started from 5D and then double rolled up to 3D... woo trees). I found I got some speed hits from using TimeDistributed out of the box, so I wrote own. you can find it here. I don't provide any real guarantees with it. I'm fairly certain it's working, though I've since moved on to another summarization method (for various reasons).

As you can see in the code though, it's based on Wrapper (which TimeDistributed is), but it explicitly expects for there to be 1 less dimension:

    def get_output_shape_for(self, input_shape):
        child_input_shape = (1,) + input_shape[-2:]
        child_output_shape = self.layer.get_output_shape_for(child_input_shape)
        return input_shape[:-2] + child_output_shape[-1:]

braingineer on 7 May 2016

Thanks!@fchollet @braingineer

I got a simple temporary solution -- by converting to (nb_samples*n_sentences, timesteps, input_dim) instead, which uses a Lambda layer to reshape the mini-batch, and then reshape the mini-batch back. The fact that the Lambda layer can look at the full batch information is nice and flexible:)

wddabc on 10 May 2016

👍4

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.

stale[bot] on 23 May 2017

I'm also trying to get a 4D TimeDistributed(LSTM(...)) to work..

  batch_size = 1  
  model = Sequential()
  model.add(TimeDistributed(LSTM(7, batch_input_shape=(batch_size, look_back, dataset.shape[1], 
                                                   dataset.shape[2]), stateful=True, return_sequences=True)
                           , batch_input_shape=(batch_size, look_back, dataset.shape[1], dataset.shape[2])))
  model.add(TimeDistributed(LSTM(7, batch_input_shape=(batch_size, look_back, dataset.shape[1],    
                               dataset.shape[2]), stateful=True)
                          , batch_input_shape=(batch_size, look_back, dataset.shape[1], dataset.shape[2])))
   model.add(TimeDistributed(Dense(7, input_shape = (batch_size, 1,look_back, dataset.shape[1],  
                              dataset.shape[2]))))
   model.compile(loss ='mean_squared_error', optimizer='adam')
   for i in range(10):
        model.fit(trainX, trainY, epochs=1, batch_size=batch_size, verbose=2, shuffle=False)
        model.reset_states()

The input shapes for trainX, trainY, and dataset are as follows:
trainX.shape = (63, 3, 34607, 7) trainY.shape = (63, 34607, 7) dataset.shape = (100, 34607, 7)

The errors I am receiving are as follows:
Error when checking target: expected time_distributed_59 to have shape (1, 3, 7) but got array with shape (63, 34607, 7)

The above layer mentioned is regarding the last TimeDistributed(Dense(...)) Layer.

Thank you for any suggestions!

My best,
Michael