I'm trying to build a seq2seq autoencoder with the goal of getting a fixed sized vector from a sequence, which represents the sequence as good as possible. This means the output should exactly like the input. This autoencoder consists of two parts:
The input looks like this (one-hot-encoded, 120 time steps with 115 vector length).
array([[[1, 0, 0, ..., 0, 0, 0],
[0, 1, 0, ..., 0, 0, 0],
[0, 0, 1, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]]])
I have 11.000 examples.
This is my current coding:
inp = Input((120,115))
out = LSTM(units = 200, return_sequences=True, activation='tanh')(inp)
out = LSTM(units = 180, return_sequences=True)(out)
out = LSTM(units = 140, return_sequences=True, activation='tanh')(out)
out = LSTM(units = 120, return_sequences=False, activation='tanh')(out)
encoder = Model(inp,out)
out_dec = RepeatVector(120)(out) # I also tried to use Reshapeinstead, not really a difference
out1 = LSTM(200,return_sequences=True, activation='tanh')(out_dec)
out1 = LSTM(175,return_sequences=True, activation='tanh')(out1)
out1 = LSTM(150,return_sequences=True, activation='tanh')(out1)
out1 = LSTM(115,return_sequences=True, activation='sigmoid')(out1) # I also tried softmax instead of sigmoid, not really a difference
decoder = Model(inp,out1)
autoencoder = Model(encoder.inputs, decoder(encoder.inputs))
autoencoder.compile(loss='binary_crossentropy',
optimizer='RMSprop',
metrics=['accuracy'])
autoencoder.fit(padded_sequences[:9000], padded_sequences[:9000],
batch_size=150,
epochs=5,
validation_data=(padded_sequences[9001:], padded_sequences[9001:]))
But after a few epochs of training, there is no improvement anymore.
The output for the example in the beginning looks like this, not very much the same...
array([[[ 0.14739206, 0.49056929, 0.06915747, ..., 0. ,
0. , 0. ],
[ 0.03878205, 0.7227878 , 0.03550367, ..., 0. ,
0. , 0. ],
[ 0.02073009, 0.74334699, 0.03663541, ..., 0. ,
0. , 0. ],
...,
[ 0. , 0.08416401, 0. , ..., 0. ,
0. , 0. ],
[ 0. , 0.08630376, 0. , ..., 0. ,
0. , 0. ],
[ 0. , 0.08602102, 0. , ..., 0. ,
0. , 0. ]]], dtype=float32)
The embedding vector (produced by encoder.predict) looks like this (somehow weird as all values are nearly -1, 0, or 1).
array([[ -1.00000000e+00, -0.00000000e+00, -1.00000000e+00,
1.00000000e+00, 1.00000000e+00, 9.99999523e-01,
1.00000000e+00, 9.99999881e-01, 1.00000000e+00,
9.99989152e-01, 9.99999821e-01, 9.99998808e-01,
1.00000000e+00, -0.00000000e+00, -4.86032724e-01,
9.99996543e-01, 1.00000000e+00, 0.00000000e+00,
1.00000000e+00, 0.00000000e+00, 0.00000000e+00,
1.00000000e+00, -0.00000000e+00, 0.00000000e+00,
0.00000000e+00, -0.00000000e+00, 9.99999464e-01,
-9.99999881e-01, -0.00000000e+00, 4.75281268e-01,
3.01986277e-01, 6.65608108e-01, -9.99999881e-01,
0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
0.00000000e+00, -0.00000000e+00, -3.65448680e-15,
-9.99888301e-01, -0.00000000e+00, -1.00000000e+00,
-1.00000000e+00, -9.90761220e-01, -9.96851087e-01,
-0.00000000e+00, 0.00000000e+00, -1.47916377e-02,
-9.99999523e-01, -2.90349454e-01, -9.99999702e-01,
-7.63339102e-02, -1.00000000e+00, -4.16638345e-01,
-9.99999940e-01, -1.00000000e+00, -9.99996841e-01,
..............
So I assume that's not the correct way to build a seq2seq autoencoder. What could be better?
Have you taken a look at https://github.com/farizrahman4u/seq2seq ?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
Thanks, @thvasilo that's very helpful but I think @ScientiaEtVeritas is looking for an LSTM based AutoEncoder(seq2vec) not a seq2seq generator (question answering or machine translation).
Actually, there are scenarios where sequence dimensionality reduction is needed and that's the place where AutoEncoder can play a part.
Hi, @ScientiaEtVeritas , I think you were inspired by the tutorial: https://blog.keras.io/building-autoencoders-in-keras.html
I don't know what the problem you were trying to solve. If you are using AutoEncoder for dimension reduction on multivariate sequence data, you should reduce the number of neurons to a value that less than your input vector's dimension (115 I think). I'm working on multivariate sequence data dimension reduction now, we could discuss this problem.
Or maybe Keras team can help complete the correct way to build a seq2vector AutoEncoder based on LSTM in their tutorials.
@ScientiaEtVeritas By the way, to evaluate AutoEncoder, 'acc' may not be the right choice, instead, we use the reconstruction error (usually MSE) to measure how well AutoEncoder's latent layer can represent the input. For further information, u can refer to the paper: http://proceedings.mlr.press/v28/kamyshanska13.pdf
@ScientiaEtVeritas Is there a chance that your problem is caused by the activation functions after each layer? That may possibly be limiting the limits of the available values and causing the output values to be very similar.
@Emilio66 @ScientiaEtVeritas @TrepidEd I'm working on a similar problem, reconstructing a multivariate sequence. Each row in my dataset is a sequence of 10 timesteps and 32 dimensions
Here is my model
Layer (type) Output Shape Param #
=================================================================
input_11 (InputLayer) (None, 10, 32) 0
_________________________________________________________________
lstm_17 (LSTM) (None, 10) 1720
_________________________________________________________________
repeat_vector_7 (RepeatVecto (None, 10, 10) 0
_________________________________________________________________
lstm_18 (LSTM) (None, 10, 32) 5504
=================================================================
Total params: 7,224
Trainable params: 7,224
Non-trainable params: 0
_________________________________________________________________
I'm not getting arrays in prediction. Is my structure correct ?
I am in the same boat as @ScientiaEtVeritas. I don't see any answers to his original question. Does anyone have an answer? Surely being able to transform a sequence into a vector has to be a common use case.
In my case, I'm interested in doing this because I plan to use the vector in a clustering algorithm so I can cluster together "same" sequences.
Most helpful comment
@ScientiaEtVeritas By the way, to evaluate AutoEncoder, 'acc' may not be the right choice, instead, we use the reconstruction error (usually MSE) to measure how well AutoEncoder's latent layer can represent the input. For further information, u can refer to the paper: http://proceedings.mlr.press/v28/kamyshanska13.pdf