Like many generation task in NLP,when using rnn model in inference,how I use the rnn model to get the output at each word?Not by sentence.
@fchollet
I try to use 'predict','predict_proba' which not can do it.
did you try return_sequences=True?
cf. http://keras.io/layers/recurrent/
You mean ,if I set return_sequences=True that I can get one word ouput at one time step?
As far as I understand ,if I set return_sequences=True ,I can use in multi layers in RNN.
@henry0312
may be return_sequences=False for getting last timestep output of RNN.
when you use rnn.the default parameter return_sequences=False.So when you pass one word (not a sentence)at one time ,you will get a mistake like:
TypeError: ('Bad input argument to theano function with name "build/bdist.linux-x86_64/egg/keras/backend/theano_backend.py:380" at index 0 (0-based)', 'Wrong number of dimensions: expected 3, got 1 with shape (128,).')
@nzw0301
Sorry I made a mistake. (I think so too for using return_sequences=True.)
Could you see example of lstm_text_generation ?
Thanks for your replies.But if I set return_sequences=True,I will get compile error like.
Traceback (most recent call last):
File "/home/towan/PycharmProjects/test_keras/test_ner/ner_rnn_coll2003_classification_cal.py", line 55, in
model.compile(loss='categorical_crossentropy', optimizer='adam')
File "build/bdist.linux-x86_64/egg/keras/models.py", line 467, in compile
File "build/bdist.linux-x86_64/egg/keras/layers/containers.py", line 128, in get_output
File "build/bdist.linux-x86_64/egg/keras/layers/core.py", line 679, in get_output
File "build/bdist.linux-x86_64/egg/keras/layers/core.py", line 175, in get_input
File "build/bdist.linux-x86_64/egg/keras/layers/core.py", line 1086, in get_output
File "build/bdist.linux-x86_64/egg/keras/layers/core.py", line 175, in get_input
File "build/bdist.linux-x86_64/egg/keras/layers/core.py", line 869, in get_output
File "build/bdist.linux-x86_64/egg/keras/backend/theano_backend.py", line 274, in repeat
my code for building model is like:
model = Sequential()
model.add(LSTM(hidden_size, input_shape=(max_len, nb_feature),return_sequences=False))
model.add(RepeatVector(max_len))
model.add(TimeDistributedDense(labels))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
@nzw0301
What are your input data and labels?
my input shape is (20,15,300)
labels is 8
@nzw0301
If y (collect label) is a one-hot (8 dim vector) representation:
model = Sequential()
model.add(LSTM(hidden_size, input_shape=(max_len, nb_feature),return_sequences=False))
model.add(Dense(labels))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
Thanks.
But your model structure looks like example of lstm_text_generation ,which is many-to-one style.My model structure is many-to-many,like seq2seq.
So when I do it in inference,input a sequence,I need get one word output at previous time to feed into current time.
oh, sorry.
see issue #2403
In my experience, I convert many-to-many to many-to-one for encoder-decoder model.
I referred to a example of image caption model: "Architecture for learning image captions with a convnet and a Gated Recurrent Unit" section in http://keras.io/getting-started/sequential-model-guide/#examples
Also inference.
Thank you.
the issue #2403 can't help me,also it predicts by using the function predict by sequence,I need by word.
Converting many-to-many to many-to-one is a good idea.I will try to convert to it.But I guess that if I using many-to-one,its input also it a sequence,not a word?
@nzw0301
Sorry for my lack of understanding.
Its input is a sequence, not a word, and output is a word.
Sorry for my poor English.
Yes, as for many-to-one,input is a sequence, predict is one word.
as for many-to-many,input is a sequence,predict is sequence.
My problem is for 'many- to-many',I need to do input is word,also get a word output in real time.Not like input a sequence to get a word or a sentence as output.Because sometime we can use previous output as a part of current input,like some generator task when using in inference.
Maybe we can convert 'many-to-many' style into 'one-to-many',but I did't know how to convert to it.
@nzw0301
Sorry, me too...
When predicting by using many-to-one, you could repeat predict a next word, then you join input sentence and predicted a word (a input of first step is a word).
<EOS>Like your example,if I set rnn max_len=3 .
When my first input word is "I",but I can't directly feed input "I" to rnn model,maybe I need to pad the word to make the sequence length is 3,like "I unk unk".
So I don't know why I predict the need word that need to pad input.As for rnn implement mechanism,it need not pad str to maxlen.
Basically, what you are trying to do is not possible in Keras, except by using a workaround like the one @nzw0301 suggested.
The reason is that to use the output of the current timestep as input for the next one, you would basically need to go "depth-first", i.e., calculate one timestep for ALL layers, then the next timestep, and so on. What Keras does, however, is calculate ALL timesteps of ONE layer, before feeding the output into the next one.
I believe there are three ways to go about this:
stateful=True) and make them only read one timestep at a time. Stateful means that they keep their hidden state between sequences, so it should effectively work like a "normal" RNN/LSTM, except that you only input a sequence of length 1. (Then you can interpret the output and calculate the input for the next timestep.) I never tried this way myself so far, though.Note: I'm no expert with Keras, but I've tried implementing exactly the same thing you describe, so this is how I understand Keras to work so far.
Thanks for your suggestions , I do it like the first way you described.
Sometime it works well,but when you use bi-rnn,you can't get the backward step output.
@mbollmann @farizrahman4u May I ask if the seq2seq approach implemented here https://github.com/farizrahman4u/seq2seq matches the picture on the same site:

In the picture the output word in the decoder is used as input for the next phase. So in every phase (of the decoder) the decoder sees the predicted word. But in the seq2seq implementation I cannot find this. So starting from decoder vector, it looks to me more like this one-to-many approach depicted here:

I believe @farizrahman4u 's seq2seq does _not_ do that, for the reasons I outlined in my previous comment. What you want to do there is "depth-first" calculation, i.e. calculate all layers for one timestep, and use the output at the _final_ layer as input for the next timestep.
What I believe seq2seq does is take the output of one LSTM layer for one timestep and use that at the same LSTM's input for the next timestep, which I believe is not what you'd usually want.
I would love to be corrected or proven wrong on this, though.
Any body having character based text classification code in keras like in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/skflow/text_classification_character_rnn.py
Have you found the solution to this problem?
I am also stuck in this case where I want to implement the following recurrence formula h(t) = tanh(W.x + U.h(t-1) + V.f(O(t-1)) + b , in whichO(t-1) is the classification output (after softmax) of the previous time-step of RNN. I want to apply a non-linear function f() to it and use it to compute h(t). I tried using recurrentshop, but I could not realize what to do, so any help would be deeply appreciated, either with pure Keras, or with recurrentshop.
@monaj07 you have to subclass LSTM or GRU and override the step function.
@erickrf Thanks for your response. I ended up using Theano. I could not figure it out how to access the softmax output within the step function in Keras, as the softmax operation is defined in another layer, and when I wanted to customize it to get all the outputs from the RNN class, I got confused as there were multiple level of hierarchies between functions and classes, as I am not very strong in python.
@mbollmann
Basically, what you are trying to do is not possible in Keras, except by using a workaround like the one @nzw0301 suggested.
can u eloborate on the three workarounds
@harikrishnavydana I've already elaborated on that above. What part is unclear to you?
This is now possible with RecurrentShop..
could you please give an example, I try to use Recurrentshop but the doc is too simple
Note : Use the recurrentshop-1 branch
rnn = RecurrentSequential(readout=True) # previous output will be added to input
rnn.add(LSTMCell(10, input_dim=10))
That's good. But if I'm right, this would merge the output at t-1 and input at t as the new input at t, using 'add' method by default. How could i do if i want to use the output at t-1 completely? Rewrite RecurrentSequential?
No, you can use RecurrentModel to write any arbitrary RNN.
input = Input((10,))
readout_input = Input((10,))
h_tm1 = Input((10,))
c_tm1 = Input((10,))
lstm_input = add([input, readout_input]) # Here we add to input.. you can do whatever you want with a Lambda layer
output, h_t, c_t = LSTMCell(10)([lstm_input, h_tm1, c_tm1])
rnn = RecurrentModel(input=input, initial_states=[h_tm1, c_tm1], output=output, final_states=[h_t, c_t], readout_input=readout_input)
Sorry for not responding for nearly a week because of my health problem. Thanks for the solution! It is amazing to customizing a LSTM like this!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
@fchollet is there any functionality at inference time to sample the first word in the sequence (ie. argmax) and then pass that as the input into the next LSTM state? Or do we still need to create workaround such as what @nzw0301 suggested?
The desired functionality would be something similar to feed_previous in tensorflow
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
Most helpful comment
Basically, what you are trying to do is not possible in Keras, except by using a workaround like the one @nzw0301 suggested.
The reason is that to use the output of the current timestep as input for the next one, you would basically need to go "depth-first", i.e., calculate one timestep for ALL layers, then the next timestep, and so on. What Keras does, however, is calculate ALL timesteps of ONE layer, before feeding the output into the next one.
I believe there are three ways to go about this:
stateful=True) and make them only read one timestep at a time. Stateful means that they keep their hidden state between sequences, so it should effectively work like a "normal" RNN/LSTM, except that you only input a sequence of length 1. (Then you can interpret the output and calculate the input for the next timestep.) I never tried this way myself so far, though.Note: I'm no expert with Keras, but I've tried implementing exactly the same thing you describe, so this is how I understand Keras to work so far.