Keras: RNN encoder-decoder implementation with keras?

Created on 20 Aug 2015 · 11Comments · Source: keras-team/keras

Dear all.

I'm interesting in RNN encoder-decoder model (Cho et al. 2014)
Let me know where can I find it if it was already implemented with keras by someone.
if not, I will try to implement it by myself. Please give me some tips to start code.

You can check the detail of the model
http://arxiv.org/pdf/1406.1078.pdf
"Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation"

Source

hugman

Most helpful comment

@phdowling @Smerity Did you find a way to feed the last prediction of the LSTM into the next timestep?
I am trying to evalauate how much difference this makes, but found no way to implement it.
Thanks in advance

siavash9000 on 21 Jun 2016

👍12

All 11 comments

Something like this worked for me:

model = Sequential()
model.add(Gru(inp_dim, out_dim, return_sequence=False)) # encoder
model.add(RepeatVector(sequence_length)) # Get the last output of the GRU and repeats it
model.add(Gru(out_dim, inp_dim), return_sequence=True) # decoder

EderSantana on 21 Aug 2015

👍6

Eder, what optimizer and loss functions did you use?

cjmcmurtrie on 22 Aug 2015

I usually work with rmsprop or adam. The loss function will depend on your problem. I did sequence to sequence learning to reconstruct binary images using binary_crossentropy. If you want to do phrase to phrase, you may probably have to use the categorial_crossentropy to get each specific word, I guess.

PS: You may also consider a Dense layer after the first GRU and a TimeDistributedLayer after the second to make your model more powerful. There is one thing here though. Sequence generation people usually feedback their final output back to the recurrent layer. This won't happen if you use a *Dense layer after your GRUs, since the output of the former does not go back to the GRU. Obviously, you could write a new class to do that if you need.

EderSantana on 22 Aug 2015

@EderSantana

For correct understanding, I attached RNN encoder-decoder of Cho et al.(2014) with marks.

Did you mean

Dense to (1) in below figure?
TimeDistributedLayer to (2) in the figure?
need to code new class for feeding final output back to the GRU to (3) in the figure?

rnn_ed

hugman on 25 Aug 2015

Just to mention, the recently added addition_rnn example provides an initial framework for a character based RNN encoder-decoder. Whilst the task there is different (given the input "123+42" produce "165" as output) it can be trivially modified to perform on a word level and may be a good starting point for people. The main missing part is that the decoder in the paper is h_t = f(h_{t-1}, y_{t-1}, c) whilst the decoder in the example above is h_t = f(h_{t-1}, c) (i.e. we don't feed in the last timestep's output as input to the next timestep).

Smerity on 27 Aug 2015

Thank you @Smerity and @EderSantana

addition_rnn was very helpful.
I think I understood what is going on in RNN encoder and decoder.

I will close this issue.

hugman on 4 Sep 2015

@Smerity Regarding the second part of your comment - do you know of any way to implement the feedback of the last prediction to the next timestep in Keras? It seems like this would make a big difference for sequence to sequence learning

phdowling on 9 Jun 2016

siavash9000 on 21 Jun 2016

👍12

Unable to match dimentions when input is 3-d for text :where doc is represented as word embedding sequence:
code:
inputs = Input(shape=(MAX_SEQUENCE_LENGTH,), name="input")
embedded_sequences = embedding_layer(inputs)

encoded = Bidirectional(LSTM(128,return_sequences=True), merge_mode="sum",name="encoder_lstm")(embedded_sequences)

decoded = RepeatVector((MAX_SEQUENCE_LENGTH,EMBEDDING_DIM),name="repeater")(encoded)
decoded = Bidirectional(LSTM(128),merge_mode="sum",name="decoder_lstm")(encoded)
autoencoder = Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss="mse")

autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

print autoencoder.summary()

rakshajalan on 18 Aug 2017

@rakshajalan . I'm not entirely sure I understand what you are trying to do, but you might want to look at this section to make sure you are using the embedding layer as you intend: https://keras.io/layers/embeddings/
I hope this helps. Thanks.

td2014 on 18 Aug 2017

I have represented each sentence as list of word embeddings.I want to regenerate the same using sentence representation in 3D using decoder.I am stuck at defining decoder layer so that I can get output in form of [ [w1 embedding][w2 embedding]....] for each sentence.But again while calling model.fit(),we need to pass "y".which is the original words in sentence .As we know for x_train,embedding layer is taking care of mapping word_id to word_vectors.Do we need to do same i.e passing y through embedding layer to get representation as x_train.I am really confused to syncronise all this stuffs.Please give me a sample code where input is sequence of word_vectors which is expected to regenerate input as in standard autoencoder

rakshajalan on 19 Aug 2017

Was this page helpful?

0 / 5 - 0 ratings