Dear all.
I'm interesting in RNN encoder-decoder model (Cho et al. 2014)
Let me know where can I find it if it was already implemented with keras by someone.
if not, I will try to implement it by myself. Please give me some tips to start code.
You can check the detail of the model
http://arxiv.org/pdf/1406.1078.pdf
"Learning Phrase Representations using RNN Encoder鈥揇ecoder for Statistical Machine Translation"
Something like this worked for me:
model = Sequential()
model.add(Gru(inp_dim, out_dim, return_sequence=False)) # encoder
model.add(RepeatVector(sequence_length)) # Get the last output of the GRU and repeats it
model.add(Gru(out_dim, inp_dim), return_sequence=True) # decoder
Eder, what optimizer and loss functions did you use?
I usually work with rmsprop or adam. The loss function will depend on your problem. I did sequence to sequence learning to reconstruct binary images using binary_crossentropy. If you want to do phrase to phrase, you may probably have to use the categorial_crossentropy to get each specific word, I guess.
PS: You may also consider a Dense layer after the first GRU and a TimeDistributedLayer after the second to make your model more powerful. There is one thing here though. Sequence generation people usually feedback their final output back to the recurrent layer. This won't happen if you use a *Dense layer after your GRUs, since the output of the former does not go back to the GRU. Obviously, you could write a new class to do that if you need.
@EderSantana
For correct understanding, I attached RNN encoder-decoder of Cho et al.(2014) with marks.
Did you mean
Dense to (1) in below figure?TimeDistributedLayer to (2) in the figure?
Just to mention, the recently added addition_rnn example provides an initial framework for a character based RNN encoder-decoder. Whilst the task there is different (given the input "123+42" produce "165" as output) it can be trivially modified to perform on a word level and may be a good starting point for people. The main missing part is that the decoder in the paper is h_t = f(h_{t-1}, y_{t-1}, c) whilst the decoder in the example above is h_t = f(h_{t-1}, c) (i.e. we don't feed in the last timestep's output as input to the next timestep).
Thank you @Smerity and @EderSantana
addition_rnn was very helpful.
I think I understood what is going on in RNN encoder and decoder.
I will close this issue.
@Smerity Regarding the second part of your comment - do you know of any way to implement the feedback of the last prediction to the next timestep in Keras? It seems like this would make a big difference for sequence to sequence learning
@phdowling @Smerity Did you find a way to feed the last prediction of the LSTM into the next timestep?
I am trying to evalauate how much difference this makes, but found no way to implement it.
Thanks in advance
Unable to match dimentions when input is 3-d for text :where doc is represented as word embedding sequence:
code:
inputs = Input(shape=(MAX_SEQUENCE_LENGTH,), name="input")
embedded_sequences = embedding_layer(inputs)
encoded = Bidirectional(LSTM(128,return_sequences=True), merge_mode="sum",name="encoder_lstm")(embedded_sequences)
decoded = RepeatVector((MAX_SEQUENCE_LENGTH,EMBEDDING_DIM),name="repeater")(encoded)
decoded = Bidirectional(LSTM(128),merge_mode="sum",name="decoder_lstm")(encoded)
autoencoder = Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss="mse")
print autoencoder.summary()
@rakshajalan . I'm not entirely sure I understand what you are trying to do, but you might want to look at this section to make sure you are using the embedding layer as you intend: https://keras.io/layers/embeddings/
I hope this helps. Thanks.
I have represented each sentence as list of word embeddings.I want to regenerate the same using sentence representation in 3D using decoder.I am stuck at defining decoder layer so that I can get output in form of [ [w1 embedding][w2 embedding]....] for each sentence.But again while calling model.fit(),we need to pass "y".which is the original words in sentence .As we know for x_train,embedding layer is taking care of mapping word_id to word_vectors.Do we need to do same i.e passing y through embedding layer to get representation as x_train.I am really confused to syncronise all this stuffs.Please give me a sample code where input is sequence of word_vectors which is expected to regenerate input as in standard autoencoder
Most helpful comment
@phdowling @Smerity Did you find a way to feed the last prediction of the LSTM into the next timestep?
I am trying to evalauate how much difference this makes, but found no way to implement it.
Thanks in advance