Keras: rnn-How to get last output to feed into current input

Created on 13 Jun 2016 · 36Comments · Source: keras-team/keras

Like many generation task in NLP,when using rnn model in inference,how I use the rnn model to get the output at each word?Not by sentence.
@fchollet
I try to use 'predict','predict_proba' which not can do it.

stale

Source

htw2012

Most helpful comment

Basically, what you are trying to do is not possible in Keras, except by using a workaround like the one @nzw0301 suggested.

The reason is that to use the output of the current timestep as input for the next one, you would basically need to go "depth-first", i.e., calculate one timestep for ALL layers, then the next timestep, and so on. What Keras does, however, is calculate ALL timesteps of ONE layer, before feeding the output into the next one.

I believe there are three ways to go about this:

Feed an incomplete sequence, like @nzw0301 suggested, and only look at the timestep of the output you are interested in. This is not computationally efficient, but should work.
Make a custom layer. This layer would use its output at one timestep as input to the next one. However, note that this will be restricted to this single layer, i.e. you cannot apply several layers and THEN get the output this way. https://github.com/farizrahman4u/seq2seq/blob/master/seq2seq/models.py#L93 is an implementation for an old Keras version that does something like that.
Use stateful RNNs (stateful=True) and make them only read one timestep at a time. Stateful means that they keep their hidden state between sequences, so it should effectively work like a "normal" RNN/LSTM, except that you only input a sequence of length 1. (Then you can interpret the output and calculate the input for the next timestep.) I never tried this way myself so far, though.

Note: I'm no expert with Keras, but I've tried implementing exactly the same thing you describe, so this is how I understand Keras to work so far.

mbollmann on 17 Jun 2016

👍8

All 36 comments

did you try return_sequences=True?
cf. http://keras.io/layers/recurrent/

henry0312 on 14 Jun 2016

You mean ,if I set return_sequences=True that I can get one word ouput at one time step?

As far as I understand ,if I set return_sequences=True ,I can use in multi layers in RNN.
@henry0312

htw2012 on 15 Jun 2016

may be return_sequences=False for getting last timestep output of RNN.

nzw0301 on 16 Jun 2016

when you use rnn.the default parameter return_sequences=False.So when you pass one word (not a sentence)at one time ,you will get a mistake like:

TypeError: ('Bad input argument to theano function with name "build/bdist.linux-x86_64/egg/keras/backend/theano_backend.py:380" at index 0 (0-based)', 'Wrong number of dimensions: expected 3, got 1 with shape (128,).')
@nzw0301

htw2012 on 16 Jun 2016

Sorry I made a mistake. (I think so too for using return_sequences=True.)

Could you see example of lstm_text_generation ?

nzw0301 on 16 Jun 2016

Thanks for your replies.But if I set return_sequences=True,I will get compile error like.

Traceback (most recent call last):
File "/home/towan/PycharmProjects/test_keras/test_ner/ner_rnn_coll2003_classification_cal.py", line 55, in
model.compile(loss='categorical_crossentropy', optimizer='adam')
File "build/bdist.linux-x86_64/egg/keras/models.py", line 467, in compile
File "build/bdist.linux-x86_64/egg/keras/layers/containers.py", line 128, in get_output
File "build/bdist.linux-x86_64/egg/keras/layers/core.py", line 679, in get_output
File "build/bdist.linux-x86_64/egg/keras/layers/core.py", line 175, in get_input
File "build/bdist.linux-x86_64/egg/keras/layers/core.py", line 1086, in get_output
File "build/bdist.linux-x86_64/egg/keras/layers/core.py", line 175, in get_input
File "build/bdist.linux-x86_64/egg/keras/layers/core.py", line 869, in get_output
File "build/bdist.linux-x86_64/egg/keras/backend/theano_backend.py", line 274, in repeat

my code for building model is like:

model = Sequential()
model.add(LSTM(hidden_size, input_shape=(max_len, nb_feature),return_sequences=False))
model.add(RepeatVector(max_len))
model.add(TimeDistributedDense(labels))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

@nzw0301

htw2012 on 16 Jun 2016

What are your input data and labels?

nzw0301 on 16 Jun 2016

my input shape is (20,15,300)
labels is 8
@nzw0301

htw2012 on 16 Jun 2016

If y (collect label) is a one-hot (8 dim vector) representation:

model = Sequential()
model.add(LSTM(hidden_size, input_shape=(max_len, nb_feature),return_sequences=False))
model.add(Dense(labels))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

nzw0301 on 16 Jun 2016

Thanks.
But your model structure looks like example of lstm_text_generation ,which is many-to-one style.My model structure is many-to-many,like seq2seq.

So when I do it in inference,input a sequence,I need get one word output at previous time to feed into current time.

htw2012 on 17 Jun 2016

oh, sorry.
see issue #2403

nzw0301 on 17 Jun 2016

In my experience, I convert many-to-many to many-to-one for encoder-decoder model.

I referred to a example of image caption model: "Architecture for learning image captions with a convnet and a Gated Recurrent Unit" section in http://keras.io/getting-started/sequential-model-guide/#examples
Also inference.

nzw0301 on 17 Jun 2016

Thank you.
the issue #2403 can't help me,also it predicts by using the function predict by sequence,I need by word.

Converting many-to-many to many-to-one is a good idea.I will try to convert to it.But I guess that if I using many-to-one,its input also it a sequence,not a word?
@nzw0301

htw2012 on 17 Jun 2016

Sorry for my lack of understanding.

Its input is a sequence, not a word, and output is a word.

nzw0301 on 17 Jun 2016

Sorry for my poor English.

Yes, as for many-to-one,input is a sequence, predict is one word.
as for many-to-many,input is a sequence,predict is sequence.

My problem is for 'many- to-many',I need to do input is word,also get a word output in real time.Not like input a sequence to get a word or a sentence as output.Because sometime we can use previous output as a part of current input,like some generator task when using in inference.

Maybe we can convert 'many-to-many' style into 'one-to-many',but I did't know how to convert to it.
@nzw0301

htw2012 on 17 Jun 2016

Sorry, me too...

When predicting by using many-to-one, you could repeat predict a next word, then you join input sentence and predicted a word (a input of first step is a word).

input: "I", output:next word (e.g. am)
input: "I am", output: next word (e.g. person)
... until <EOS>

nzw0301 on 17 Jun 2016

Like your example,if I set rnn max_len=3 .

When my first input word is "I",but I can't directly feed input "I" to rnn model,maybe I need to pad the word to make the sequence length is 3,like "I unk unk".

So I don't know why I predict the need word that need to pad input.As for rnn implement mechanism,it need not pad str to maxlen.

htw2012 on 17 Jun 2016

Basically, what you are trying to do is not possible in Keras, except by using a workaround like the one @nzw0301 suggested.

I believe there are three ways to go about this:

Feed an incomplete sequence, like @nzw0301 suggested, and only look at the timestep of the output you are interested in. This is not computationally efficient, but should work.
Make a custom layer. This layer would use its output at one timestep as input to the next one. However, note that this will be restricted to this single layer, i.e. you cannot apply several layers and THEN get the output this way. https://github.com/farizrahman4u/seq2seq/blob/master/seq2seq/models.py#L93 is an implementation for an old Keras version that does something like that.
Use stateful RNNs (stateful=True) and make them only read one timestep at a time. Stateful means that they keep their hidden state between sequences, so it should effectively work like a "normal" RNN/LSTM, except that you only input a sequence of length 1. (Then you can interpret the output and calculate the input for the next timestep.) I never tried this way myself so far, though.

Note: I'm no expert with Keras, but I've tried implementing exactly the same thing you describe, so this is how I understand Keras to work so far.

mbollmann on 17 Jun 2016

👍8

Thanks for your suggestions , I do it like the first way you described.
Sometime it works well,but when you use bi-rnn,you can't get the backward step output.

htw2012 on 17 Jun 2016

@mbollmann @farizrahman4u May I ask if the seq2seq approach implemented here https://github.com/farizrahman4u/seq2seq matches the picture on the same site:
687474703a2f2f6936342e74696e797069632e636f6d2f333031333674652e706e67

In the picture the output word in the decoder is used as input for the next phase. So in every phase (of the decoder) the decoder sees the predicted word. But in the seq2seq implementation I cannot find this. So starting from decoder vector, it looks to me more like this one-to-many approach depicted here:
diags

ghost on 10 Aug 2016

I believe @farizrahman4u 's seq2seq does _not_ do that, for the reasons I outlined in my previous comment. What you want to do there is "depth-first" calculation, i.e. calculate all layers for one timestep, and use the output at the _final_ layer as input for the next timestep.

What I believe seq2seq does is take the output of one LSTM layer for one timestep and use that at the same LSTM's input for the next timestep, which I believe is not what you'd usually want.

I would love to be corrected or proven wrong on this, though.

mbollmann on 10 Aug 2016

Any body having character based text classification code in keras like in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/skflow/text_classification_character_rnn.py

vinayakumarr on 3 Sep 2016

Have you found the solution to this problem?
I am also stuck in this case where I want to implement the following recurrence formula h(t) = tanh(W.x + U.h(t-1) + V.f(O(t-1)) + b , in whichO(t-1) is the classification output (after softmax) of the previous time-step of RNN. I want to apply a non-linear function f() to it and use it to compute h(t). I tried using recurrentshop, but I could not realize what to do, so any help would be deeply appreciated, either with pure Keras, or with recurrentshop.

monaj07 on 1 Nov 2016

@monaj07 you have to subclass LSTM or GRU and override the step function.

erickrf on 14 Nov 2016

@erickrf Thanks for your response. I ended up using Theano. I could not figure it out how to access the softmax output within the step function in Keras, as the softmax operation is defined in another layer, and when I wanted to customize it to get all the outputs from the RNN class, I got confused as there were multiple level of hierarchies between functions and classes, as I am not very strong in python.

monaj07 on 15 Nov 2016

@mbollmann
Basically, what you are trying to do is not possible in Keras, except by using a workaround like the one @nzw0301 suggested.

can u eloborate on the three workarounds

HariKrishna-Vydana on 21 Jan 2017

@harikrishnavydana I've already elaborated on that above. What part is unclear to you?

mbollmann on 21 Jan 2017

This is now possible with RecurrentShop..

farizrahman4u on 21 Jan 2017

could you please give an example, I try to use Recurrentshop but the doc is too simple

MeteorsHub on 12 Apr 2017

Note : Use the recurrentshop-1 branch

rnn = RecurrentSequential(readout=True) # previous output will be added to input
rnn.add(LSTMCell(10, input_dim=10))

farizrahman4u on 12 Apr 2017

That's good. But if I'm right, this would merge the output at t-1 and input at t as the new input at t, using 'add' method by default. How could i do if i want to use the output at t-1 completely? Rewrite RecurrentSequential?

MeteorsHub on 12 Apr 2017

No, you can use RecurrentModel to write any arbitrary RNN.

input = Input((10,))
readout_input = Input((10,))
h_tm1 = Input((10,))
c_tm1 = Input((10,))

lstm_input = add([input, readout_input]) # Here we add to input.. you can do whatever you want with a Lambda layer

output, h_t, c_t = LSTMCell(10)([lstm_input, h_tm1, c_tm1])

rnn = RecurrentModel(input=input, initial_states=[h_tm1, c_tm1], output=output, final_states=[h_t, c_t], readout_input=readout_input)

farizrahman4u on 12 Apr 2017

👀1

Sorry for not responding for nearly a week because of my health problem. Thanks for the solution! It is amazing to customizing a LSTM like this!

MeteorsHub on 20 Apr 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 19 Jul 2017

@fchollet is there any functionality at inference time to sample the first word in the sequence (ie. argmax) and then pass that as the input into the next LSTM state? Or do we still need to create workaround such as what @nzw0301 suggested?

The desired functionality would be something similar to feed_previous in tensorflow