In the code examples here, in the section titled "Sequence-to-sequence autoencoder," it reads:
[...] first use a LSTM encoder to turn your input sequences into a single vector that contains information about the entire sequence, then repeat this vector n times (where n is the number of timesteps in the output sequence), and run a LSTM decoder to turn this constant sequence into the target sequence.
The code is:
from keras.layers import Input, LSTM, RepeatVector
from keras.models import Model
inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim)(inputs)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)
My question is, why are we doing the RepeatVector operation? In the literature regarding sequence to sequence autoencoders (for example in this often cited paper by Dai & Le), there's no repetition as such. Instead, they have the following diagram:
What am I missing here? What exactly is the input sequence to the Decoder portion of the autoencoder?
Thanks!
Not sure about interpreting the image but the paper says:
A slightly better method is to use a sequence autoencoder, which uses a RNN to read a long input sequence into a single vector. This vector will then be used to reconstruct the original sequence.
So the example reads everything into a single vector, then uses that vector to reconstruct the original sequence. If you want to iteratively generate something but you only have one input, you can repeat the vector. That means each time step will get the same input but a different hidden state.
Cheers
@bstriner Thanks! Do you have a link to some literature where they've used such an architecture? Almost all frequently cited papers that I found use a different architecture.
Similar to the picture in the post, in another popular paper by Srivastava et. al ('Unsupervised Learning of Video Representations using LSTMs'), they have the following diagram:
It seems they're using the reversed input from the encoder as input here. There's a section as follows:
The decoder can be of two kinds – conditional or unconditioned. A conditional decoder receives the last generated output frame as
input, i.e., the dotted input in Fig. 2 is present. An unconditioned
decoder does not receive that input.
You can build an autoencoder either way in keras. Theoretically will train faster with a conditioned autoencoder but I haven't really compared the two.
@bstriner Thanks again! Can you help me implement the one in the figure above? Specifically, I'm looking for two things:
A way to feed the hidden state at the end of the encoder as the initial state for the decoder. How do I do that?
To use the output from cell (t-1) as input to cell (t) in a LSTM.
Thanks!
Easiest way is probably to start with the example with repeat vector. Instead of the input just being the repeated final encoder state, concatenate it with the reversed sequence shifted once. Then your input to each LSTM decoder cell is the encoder state and the previous character.
During training, the input is the encoder state and the actual previous character, but during testing the input is the encoder state and the predicted previous character. Using the output as input during testing is slightly trickier. To do it all on the GPU you would probably have to build a custom call to K.rnn. You could also just loop on the CPU.
Alright, thanks!
Hi @dhrushilbadani I had the same questions as you and I am also interested in the implementation of the seq2seq Autoencoder. I wonder if you had any progress! Cheers
@rafaelpossas I used the seq2seq library (built on top of Keras + RecurrentShop) as it offers greater flexibility in deciding how the cells in a particular layer interact with each other, thanks to RecurrentShop. Hope this helps!
Hi @bstriner I'm little bit confused about the concatenation of hidden states and encoder final state.
I want to implement the RNN encoder-decoder described in this paper:
Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).
The inputs of decoder is g(h_t, y_t-1, c), I understand once we add the RepeatVector, it will pass final state of encoder (which is c
in this case) to decoder, but how can I combine c
and y_t-1
(which is previous output) and pass it to LSTM cell?
My point is if I use RepeatVector, does LSTM still pass the output of current state to next state? Or the inputs of decoder will be the constant encoder final state for all decoding states? If I want to pass both encoding final state and decoder output to decoder, how can I combine or concatenate them? Could you give me an example?
Thanks!
I just read about the TimeDistributed layer. In my previous example if I want to pass y_t-1
and c
to decoder LSTM, should I add a TimeDistributed
layer after LSTM
layer?
@colpain, I think this might help you...
https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html
As mentioned in this blog, if you want to use the encoded state along with the previous predicted output while inferring, you need to capture the final encoded state and define the decoder inference
model as a stand-alone model that takes three inputs - the final encoded states and the previous predicted value. You will not need Repeatvector or TimeDistributed layer. You can refer to this article that I wrote using the same example as in the keras example - https://towardsdatascience.com/neural-machine-translation-using-seq2seq-with-keras-c23540453c74
I am doing the exact same thing, and used more or less the same code
model_inputs = Input(shape=(timesteps,))
inputs = Lambda(lambda x: K.expand_dims(x, -1))(model_inputs)
encoded = LSTM(latent_dim, return_sequences=False)(inputs)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(1, return_sequences=True)(decoded)
decoded = Lambda(lambda x: K.squeeze(x, -1))(decoded)
sequence_autoencoder = Model(model_inputs, decoded)
sequence_autoencoder.compile(loss='mse', optimizer='adam')
earlyStopping = keras.callbacks.EarlyStopping(monitor='loss', patience=5, verbose=0, mode='auto')
sequence_autoencoder.fit(sparse_balances[:datapoints], sparse_balances[:datapoints],
batch_size=batch_num, epochs=100,
callbacks=[earlyStopping, result_plotter])
the model seems to be in theory correct, but the decoder lstm gets always stuck on predicting just a single value for the whole timeserie, no matter how long is the training. I think the example given in the tutorial is not conceptually correct or something is missing
@HitLuca Encountered exactly the same error here. After the repitition the model.predict() function will give exact the same output with different inputs. I'm guessing they have changed the usage of encoder and decoder because the offical example of https://github.com/keras-team/keras/blob/master/examples/lstm_seq2seq.py ,they didn't use the output vector of encoder as input but discarded it and saved hidden states as inputs instead;.
@JuntingGuo Based on my experiments the model is indeed working as intended, I just used a particularly bad dataset to train it with (spiking, sparse, unidimensional timeseries). With enough time and training, the decoder LSTM actually learns to decode the latent vector, even though with more difficulty, as the initial state is always the same.
If instead the decoder is fed with a repeated constant vector and the initial state is set from the encoder, then the results are better. This has the downside of not being as quick to implement as before, because the latent dimensionality is not given by the LSTM output size anymore and you need some architecture arrangements to get the same latent dimension.
TL:DR both implementations work, but passing the hidden LSTM state seems to work better
@HitLuca Thanks for your reply. I'll look into that!
@HitLuca
I have the same issue. I ended up initializing the decoder state with the encoder state and everything worked very well.
Hi guys, what if you have non-constant sequence length? Had anyone ever faced that situation? I do all the time with audio sequences...
Hi guys, what if you have non-constant sequence length? Had anyone ever faced that situation? I do all the time with audio sequences...
https://github.com/keras-team/keras/blob/master/examples/lstm_seq2seq.py Check this link given by Junting
Most helpful comment
Not sure about interpreting the image but the paper says:
So the example reads everything into a single vector, then uses that vector to reconstruct the original sequence. If you want to iteratively generate something but you only have one input, you can repeat the vector. That means each time step will get the same input but a different hidden state.
Cheers