First of all, I know that there are already issues open regarding that topic, but their solutions don't solve my problem and I'll explain why.
The problem is to predict the next n_post
steps of a sequence given n_pre
steps of it, with n_pre < n_post
. I've built a toy example using a simple sine wave to illustrate it. The many to one forecast (n_pre=50, n_post=1)
works perfectly:
model = Sequential()
model.add(LSTM(input_dim=1, output_dim=hidden_neurons, return_sequences=False))
model.add(Dense(1))
model.add(Activation('linear'))
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])
Also, the many to many forecast with (n_pre=50, n_post=50)
gives a near perfect fit:
model = Sequential()
model.add(LSTM(input_dim=1, output_dim=hidden_neurons, return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.add(Activation('linear'))
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])
But now assume we have data that looks like this:
dataX or input: (nb_samples, nb_timesteps, nb_features) -> (1000, 50, 1)
dataY or output: (nb_samples, nb_timesteps, nb_features) -> (1000, 10, 1)
The solution given in #2403 is to build the model like this:
model = Sequential()
model.add(LSTM(input_dim=1, output_dim=hidden_neurons, return_sequences=False))
model.add(RepeatVector(10))
model.add(TimeDistributed(Dense(1)))
model.add(Activation('linear'))
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])
Well, it compiles and trains, but the prediction is really bad:
My explanation to this is: The network has only one piece of information (no return_sequences) at the end of the LSTM layer, repeats this output_dimension-times and then tries to fit. The best guess it can give is the average of all the points to predict as it doesn't know whether it is currently going down or up in the sinus wave, it loses this information with return_sequences=False
!
So, my final question is: How can I keep this information and let the LSTM layer return a part of its sequence? Because I don't want to fit it to n_pre=50
time steps but only to 10 because in my problem, the points are not so nicely correlated as in the sine wave of course. Currently I just give 50 points and then crop the output (after training) to 10 but it still tries to fit to all 50, which distorts the result.
Any help would be greatly appreciated!
I think you need to do something like this:
model = Sequential()
model.add(LSTM(input_dim=1, output_dim=hidden_neurons, return_sequences=False))
model.add(RepeatVector(10))
model.add(LSTM(output_dim=hidden_neurons, return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.add(Activation('linear'))
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])
otherwise you are just repeating the the last Dense layer and getting a constant value.
Thank you very much. I tried your suggestion and the predictions now look like this:
The number of epochs and hidden neurons is the same as in the other testcases, but the prediction is worse for 10 steps compared to 50. Is there a (simple) explanation why it gets worse with more layers? Or does it just need to train longer because it has more parameters to adjust?
I would say that the modeling assumptions of both approaches are different. In the later model, it is assumed that the model sees the complete input sequence (first 50 steps), somehow creates a summary and uses this summary to generate a new signal (last 10 steps).
On the other hand, your initial model estimated the last 50 steps while reading the input signal, no summarisation of the original signal was used.
That's a perfect and clear answer, thank you very much.
HI, I have been studying how to use the many to many model of lstm to predict time series data, and now I have the same problem that you once had, could you share your demo py files about predicting a simple sine wave to me ? I mean i want to learn your code and replace your data with mine just to have a try. it will be very nice of you if you could do me a favor! thanks first !
my email : [email protected]
thank you !
Here you go!
test_sine.txt
HI there!
It seems in new versions of Keras
the input_dim
and output_dim
arguments are replaced with input_shape()
function. So may you edit this parts of code to match to the new version:
model.add(LSTM(input_dim=1, output_dim=hidden_neurons, return_sequences=False))
model.add(LSTM(output_dim=hidden_neurons, return_sequences=True))
I also have another question. what is the reason of using model.add(Activation('linear'))
?
Thanks in advanced!
Hi @bestazad ,
You can obtain the same result using input_dim()
or input_shape()
, to my knowledge both these two "alternatives" has been used for quite some time
The reasoning the why model.add(Activation('linear'))
is used is most likely because this is (only) tentative example, other activation functions can probably give similar results here.
How would you train the model on variable input length?
Hi @gustavz
Two options/suggestions:
Padding looks easier but I would guess that this method also decreases the usefulness of the model.
If you (or anybode else) could help me with a good explanation of what RepeatVector()
does here I would be happy, this is the best reference https://stackoverflow.com/questions/51749404/how-to-connect-lstm-layers-in-keras-repeatvector-or-return-sequence-true , however, this is for a Encode/Decoder network and I'm not sure if this is the same for a LSTM network. E.g., does RepeatVector()
that the original input (from the very first layer) or does RepeatVector()
work with the inputs/outputs between hidden layers?
what is the difference between:
model = Sequential()
model.add(LSTM(input_dim=1, output_dim=hidden_neurons, return_sequences=False))
model.add(RepeatVector(10))
model.add(LSTM(output_dim=hidden_neurons, return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.add(Activation('linear'))
and
model = Sequential()
model.add(LSTM(input_dim=1, output_dim=hidden_neurons, return_sequences=True))
model.add(LSTM(output_dim=hidden_neurons, return_sequences=False))
model.add(Dense(10))
maybe best explained with this image
Thanks for this; which is which? I've added some numbers to your image to better reference the variants. I assume that the code that contains RepeatVector()
is represented by variant 4 and that the code that does not contains RepeatVector()
is represented by variant 5. Is this correct?
Thanks! :-)
Option 1 is an Encoder-Decoder, Option 2 is a Vanilla LSTM
Option 1 is part 4 of the image?
Thanks for this; which is which? I've added some numbers to your image to better reference the variants. I assume that the code that contains
RepeatVector()
is represented by variant 4 and that the code that does not containsRepeatVector()
is represented by variant 5. Is this correct?
Thanks! :-)
that the code that does not contains RepeatVector() is a many-to-one architecture (variant 3). To have a many-to-many architecture you have to mdofy the code that does not contain RepeatVector() to have return_sequences=True in both LSTM layer and not only only the first layer.
Most helpful comment
Here you go!
test_sine.txt