Keras: LSTM: How to feed the output back to the input?

Created on 15 Oct 2016 · 11Comments · Source: keras-team/keras

model = Sequential()
model.add(LSTM(512, input_dim = 4, return_sequences = True))
model.add(TimeDistributed(Dense(4)))
model.add(Activation('softmax'))

The input here is the one hot representation of a string and the dictionary size is set to be 4. In other word, there are four types of chars in this string. The output here is the probabilities that the next char ought to be.

If the length of input sequence is 1, the output dimension is 4 by 1. I just wonder could I feed the output back to the input and get an arbitrary length of output sequence (illustrated as follows). It may not be reasonable to plug back the probabilities but I just want to know the possibility to implement this one-to-many structure in keras. Thanks.

Example:

input1 -(LSTM)-> output1
output1 -(LSTM) -> output2
output2 - (LSTM) -> output3

We could get a 4 by 3 output in the end.

stale

Source

wgmao

👍4

Most helpful comment

actually, I think he will have to write his own custom layer to do that. See this DreamyRNN for example: https://github.com/commaai/research/blob/master/models/layers.py#L334-L397
It takes a n frames and input and outputs n+m where the last m frames are generated by feeding outputs back as input.

EderSantana on 17 Oct 2016

🚀1 👍1

All 11 comments

You have to do it in an external loop, as in

seq = [np.random.rand(4)]
for i in range(n_iter):
    seq.append(model.predict(seq))

kgrm on 17 Oct 2016

Thanks, kgrm. But how about the training process? If you add an external loop, I don't think model.fit( ) will work.

wgmao on 17 Oct 2016

It will, but you'll need to add a Masking layer to train it on arbitrary-length (eg, zero-padded) sequences.

kgrm on 17 Oct 2016

EderSantana on 17 Oct 2016

🚀1 👍1

That's not the case, you just have to reframe and rearrange your training data accordingly for the n+1-th step prediction task.

kgrm on 17 Oct 2016

Thanks, EderSantana.

To kgrm: As I expect, only the first char is the input. In this scenario, I don't think the external loop will help construct the right output, which illustrated as follows,

1st char -> (LSTM) -> output1 (one single of LSTM parameters embedded)
output1 -> (LSTM) -> output2 (two sets of LSTM parameters convoluted with each other)

It's this convolution that makes things complicated. Thanks for your reply.

wgmao on 17 Oct 2016

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.