Hi,
I'm currently working on a weird paragraph summarization task, which requires a 4-D tensor input with shape (nb_samples, n_sentences, timesteps, input_dim). That is, each example is a paragraph with n_sentences sentences. The story here is I need to run an RNN for each sentence and then use the the last state of the sentence as a 'sentence embedding', then I have to do some pooling over the n_sentences ( for example, averaging the sentence embeddings) as the final embedding for each paragraph.
Looks like the Keras RNN only supports the 3-D input, which means I have two ways to go:
(nb_samples, n_sentences*timesteps, input_dim). And still use the 3-D RNN. But I need to find some ways that reset the RNN every timesteps steps. I think this is a use case that on the opposite of stateful, where I have to reset within "sentence".Any suggestions? I think the latter one probably easier to implement but It is slow as the RNN should explicitly go over a "sentence" with length n_sentences*timesteps without any parallelism.
It looks like you're looking for TimeDistributed(LSTM(...)).
I had to write one for a similar situation (I actually started from 5D and then double rolled up to 3D... woo trees). I found I got some speed hits from using TimeDistributed out of the box, so I wrote own. you can find it here. I don't provide any real guarantees with it. I'm fairly certain it's working, though I've since moved on to another summarization method (for various reasons).
As you can see in the code though, it's based on Wrapper (which TimeDistributed is), but it explicitly expects for there to be 1 less dimension:
def get_output_shape_for(self, input_shape):
child_input_shape = (1,) + input_shape[-2:]
child_output_shape = self.layer.get_output_shape_for(child_input_shape)
return input_shape[:-2] + child_output_shape[-1:]
Thanks!@fchollet @braingineer
I got a simple temporary solution -- by converting to (nb_samples*n_sentences, timesteps, input_dim) instead, which uses a Lambda layer to reshape the mini-batch, and then reshape the mini-batch back. The fact that the Lambda layer can look at the full batch information is nice and flexible:)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.
I'm also trying to get a 4D TimeDistributed(LSTM(...)) to work..
batch_size = 1
model = Sequential()
model.add(TimeDistributed(LSTM(7, batch_input_shape=(batch_size, look_back, dataset.shape[1],
dataset.shape[2]), stateful=True, return_sequences=True)
, batch_input_shape=(batch_size, look_back, dataset.shape[1], dataset.shape[2])))
model.add(TimeDistributed(LSTM(7, batch_input_shape=(batch_size, look_back, dataset.shape[1],
dataset.shape[2]), stateful=True)
, batch_input_shape=(batch_size, look_back, dataset.shape[1], dataset.shape[2])))
model.add(TimeDistributed(Dense(7, input_shape = (batch_size, 1,look_back, dataset.shape[1],
dataset.shape[2]))))
model.compile(loss ='mean_squared_error', optimizer='adam')
for i in range(10):
model.fit(trainX, trainY, epochs=1, batch_size=batch_size, verbose=2, shuffle=False)
model.reset_states()
The input shapes for trainX, trainY, and dataset are as follows:
trainX.shape = (63, 3, 34607, 7)
trainY.shape = (63, 34607, 7)
dataset.shape = (100, 34607, 7)
The errors I am receiving are as follows:
Error when checking target: expected time_distributed_59 to have shape (1, 3, 7) but got array with shape (63, 34607, 7)
The above layer mentioned is regarding the last TimeDistributed(Dense(...)) Layer.
Thank you for any suggestions!
My best,
Michael
Most helpful comment
It looks like you're looking for
TimeDistributed(LSTM(...)).