Keras: Sequence to Sequence Autoencoder

Created on 4 Jul 2017 · 7Comments · Source: keras-team/keras

I'm trying to build a seq2seq autoencoder with the goal of getting a fixed sized vector from a sequence, which represents the sequence as good as possible. This means the output should exactly like the input. This autoencoder consists of two parts:

LSTM Encoder: Takes a sequence and returns an output vector (return_sequences = False)
LSTM Decoder: Takes an output vector and returns a sequence (return_sequences = True)

The input looks like this (one-hot-encoded, 120 time steps with 115 vector length).

array([[[1, 0, 0, ..., 0, 0, 0],
        [0, 1, 0, ..., 0, 0, 0],
        [0, 0, 1, ..., 0, 0, 0],
        ..., 
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]]])

I have 11.000 examples.

This is my current coding:

 inp = Input((120,115))

 out = LSTM(units = 200, return_sequences=True, activation='tanh')(inp)
 out = LSTM(units = 180, return_sequences=True)(out)
 out = LSTM(units = 140, return_sequences=True, activation='tanh')(out)
 out = LSTM(units = 120, return_sequences=False, activation='tanh')(out)
 encoder = Model(inp,out)   

 out_dec = RepeatVector(120)(out) # I also tried to use Reshapeinstead, not really a difference

 out1 = LSTM(200,return_sequences=True, activation='tanh')(out_dec)   
 out1 = LSTM(175,return_sequences=True, activation='tanh')(out1)   
 out1 = LSTM(150,return_sequences=True, activation='tanh')(out1)   
 out1 = LSTM(115,return_sequences=True, activation='sigmoid')(out1) # I also tried softmax instead of sigmoid, not really a difference

 decoder = Model(inp,out1)

autoencoder = Model(encoder.inputs, decoder(encoder.inputs))

autoencoder.compile(loss='binary_crossentropy',
              optimizer='RMSprop',
              metrics=['accuracy'])

autoencoder.fit(padded_sequences[:9000], padded_sequences[:9000],
          batch_size=150,
          epochs=5,
          validation_data=(padded_sequences[9001:], padded_sequences[9001:]))

But after a few epochs of training, there is no improvement anymore.

The output for the example in the beginning looks like this, not very much the same...

array([[[ 0.14739206,  0.49056929,  0.06915747, ...,  0.        ,
          0.        ,  0.        ],
        [ 0.03878205,  0.7227878 ,  0.03550367, ...,  0.        ,
          0.        ,  0.        ],
        [ 0.02073009,  0.74334699,  0.03663541, ...,  0.        ,
          0.        ,  0.        ],
        ..., 
        [ 0.        ,  0.08416401,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.08630376,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.08602102,  0.        , ...,  0.        ,
          0.        ,  0.        ]]], dtype=float32)

The embedding vector (produced by encoder.predict) looks like this (somehow weird as all values are nearly -1, 0, or 1).

array([[ -1.00000000e+00,  -0.00000000e+00,  -1.00000000e+00,
          1.00000000e+00,   1.00000000e+00,   9.99999523e-01,
          1.00000000e+00,   9.99999881e-01,   1.00000000e+00,
          9.99989152e-01,   9.99999821e-01,   9.99998808e-01,
          1.00000000e+00,  -0.00000000e+00,  -4.86032724e-01,
          9.99996543e-01,   1.00000000e+00,   0.00000000e+00,
          1.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          1.00000000e+00,  -0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,  -0.00000000e+00,   9.99999464e-01,
         -9.99999881e-01,  -0.00000000e+00,   4.75281268e-01,
          3.01986277e-01,   6.65608108e-01,  -9.99999881e-01,
          0.00000000e+00,  -0.00000000e+00,  -0.00000000e+00,
          0.00000000e+00,  -0.00000000e+00,  -3.65448680e-15,
         -9.99888301e-01,  -0.00000000e+00,  -1.00000000e+00,
         -1.00000000e+00,  -9.90761220e-01,  -9.96851087e-01,
         -0.00000000e+00,   0.00000000e+00,  -1.47916377e-02,
         -9.99999523e-01,  -2.90349454e-01,  -9.99999702e-01,
         -7.63339102e-02,  -1.00000000e+00,  -4.16638345e-01,
         -9.99999940e-01,  -1.00000000e+00,  -9.99996841e-01,
         ..............

So I assume that's not the correct way to build a seq2seq autoencoder. What could be better?

stale

Source

ScientiaEtVeritas

👍3

Most helpful comment

@ScientiaEtVeritas By the way, to evaluate AutoEncoder, 'acc' may not be the right choice, instead, we use the reconstruction error (usually MSE) to measure how well AutoEncoder's latent layer can represent the input. For further information, u can refer to the paper: http://proceedings.mlr.press/v28/kamyshanska13.pdf

Emilio66 on 23 Dec 2017

👍2

All 7 comments

Have you taken a look at https://github.com/farizrahman4u/seq2seq ?

thvasilo on 6 Jul 2017

👍1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 4 Oct 2017

😕2

Thanks, @thvasilo that's very helpful but I think @ScientiaEtVeritas is looking for an LSTM based AutoEncoder(seq2vec) not a seq2seq generator (question answering or machine translation).
Actually, there are scenarios where sequence dimensionality reduction is needed and that's the place where AutoEncoder can play a part.
Hi, @ScientiaEtVeritas , I think you were inspired by the tutorial: https://blog.keras.io/building-autoencoders-in-keras.html
I don't know what the problem you were trying to solve. If you are using AutoEncoder for dimension reduction on multivariate sequence data, you should reduce the number of neurons to a value that less than your input vector's dimension (115 I think). I'm working on multivariate sequence data dimension reduction now, we could discuss this problem.

Or maybe Keras team can help complete the correct way to build a seq2vector AutoEncoder based on LSTM in their tutorials.

Emilio66 on 18 Dec 2017

Emilio66 on 23 Dec 2017

👍2

@ScientiaEtVeritas Is there a chance that your problem is caused by the activation functions after each layer? That may possibly be limiting the limits of the available values and causing the output values to be very similar.

IronEdward on 11 Feb 2018

@Emilio66 @ScientiaEtVeritas @TrepidEd I'm working on a similar problem, reconstructing a multivariate sequence. Each row in my dataset is a sequence of 10 timesteps and 32 dimensions

Here is my model

Layer (type)                 Output Shape              Param #   
=================================================================
input_11 (InputLayer)        (None, 10, 32)            0         
_________________________________________________________________
lstm_17 (LSTM)               (None, 10)                1720      
_________________________________________________________________
repeat_vector_7 (RepeatVecto (None, 10, 10)            0         
_________________________________________________________________
lstm_18 (LSTM)               (None, 10, 32)            5504      
=================================================================
Total params: 7,224
Trainable params: 7,224
Non-trainable params: 0
_________________________________________________________________

I'm not getting arrays in prediction. Is my structure correct ?

raouflamari on 24 Apr 2018

I am in the same boat as @ScientiaEtVeritas. I don't see any answers to his original question. Does anyone have an answer? Surely being able to transform a sequence into a vector has to be a common use case.

In my case, I'm interested in doing this because I plan to use the vector in a clustering algorithm so I can cluster together "same" sequences.