stateful
LSTMs seem to be confusing everybody. I don't recommend stateful
unless you know what it is and have a good reason to use it.
Imagine you have a 2 layer neural network and you only train the last layer. It might learn something it might not. That is basically what stateful
is doing between batches. Cell t+1
will do its best to do something with state t
, but state t
will be random and untrained. It might learn something it might not.
@fchollet I'm thinking something like a disclaimer on the stateful LSTM example. It is a proof-of-concept showing that you can have sequence lengths of 1, but if you can pass actual sequences in a batch you will have a better model. I'm seeing a lot of people try to build models with sequence lengths of 1, which is simply a bad idea. I like having the example but we need to be clear that it is not the preferred way to do things.
Also, the stateful
example is kinda odd. It has a batch size of 25 and feeds all of the examples in order. That means hidden state at 20 is used to make the prediction at step 45. The hidden states are randomly initialized and untrained. I don't think most people understand that part and end up with some weird models.
I've mostly just been recommending that people don't use stateful and pass actual sequences in each batch.
Cheers
You are correct that sequences of size 1 are a bad idea since they imply a total absence of backprop through time, which obviously will lead to a bad model. Please send a PR to modify the example to show instead best practices (stateful + sequences of reasonable size to allow for truncated backprop though time).
i try my example with sequence=1 and the outcome is horrible. it is needed to use the original sequence length, rather than shorten it or set it 1.
Why anybody would want to have a stateful model during training is beyond me, so this part I can agree with. But during testing, when you want to let the model predict some output on some data, then stateful makes a lot more sense. For example, it might be a part of a larger system that works on video frames. It might be required to perform some action instantly after each frame, instead of waiting for a sufficiently long sequence of video frames before being fed to the network. It would be really nice if you could train the network stateless with a time-depth of X (say 16), and then use those weights on a stateful network with a time-depth of 1 during prediction. In my experience however, this does not work in Keras.
IMO It will be extremely useful for BPTT to work with stateful models (unfolded and back-propagated). It's a lot more natural for models like sequence-to-sequence models to take into account longer context (and trained with) than the current sequence to predict the target sequence. This mechanism enables models to use global (or longer) contexts rather than just local ones. I am not sure how it can be achieved yet and will dig into the code for more detail.
ahrnbom, Using model.get_weights() and model.set_weights(), the weights of a normal LSTM can be transferred to a stateful LSTM, assuming the architectures are otherwise identical.
Maybe I'm just not seeing the obvious, but could someone explain to me why it is not a good idea to use sequences of size=1 together with a stateful LSTM and why they imply an absence of BPTT?
@bstriner Can stateful lstm be trained using fit_generator?
We know that in stateful LSTM, the state passes between batches, thus training on each batch depends on the preceding batch. considering the importance of sequence, can we use fit_generator with use_multiprocessing = true? should I take into account any consideration when I want to have my customized batch generator to keep the batch number? I would like to mention that I have a time series regression type problem.
@Elch123 Have you tested that? It apparently doesn't work on my end.
Basically I have a sequential model trained stateless, and the first layer of the model is a stacked lstm. I load the trained model, edit the config with 'stateful'=True
and set batch_input_shape
, set the new model with same wights and modified config, while the testing result doesn't change.
old_network = keras.models.load_model(args.model_path, custom_objects=None, compile=False)
config = old_network.get_config()
config['layers'][0]['config']['stateful'] = True
config['layers'][0]['config']['batch_input_shape'] = (1, None, 6)
weights = old_network.get_weights()
network = tf.keras.models.Sequential.from_config(config)
network.set_weights(weights)
I trained my model with data sequence with same length of timestamps, which is independent to each other. While I want to achieve real-time prediction, where the input data has only one timestamps, and I want to make the next prediction to base on the former state.
Does anyone have some idea how to do that?
---------------------------------------update---------------------------------------------
It seems that both stateful and stateless update the state in sameway during model.predict() , is that a expected behaviour?
@bstriner : regarding your comment on the questionable suitability of Stateful LSTM, suppose I have a long Time Series with some yearly and monthly seasonality patterns : given the long dependencies between batches , don't stateful LSTM make more sense in this case? Could you please explain further your following statement: "state t will be random and untrained. It might learn something it might not."
How could state t be random ?
Thank you for your help.
@bstriner : regarding your comment on the questionable suitability of Stateful LSTM, suppose I have a long Time Series with some yearly and monthly seasonality patterns : given the long dependencies between batches , don't stateful LSTM make more sense in this case? Could you please explain further your following statement: "state t will be random and untrained. It might learn something it might not."
How could state t be random ?
Thank you for your help.
I think he means the gradient cannot really backpropagate between batches. The stateful setting enable us to initialize the hidden states of the next batch with the hidden states of the last batch, but this is still somehow "random" because you have no reason to believe that the hidden states of the last batch have been well trained. Moreover, you cannot change the initial hidden states when training on the next batch. I think that's why @bstriner said it is random and untrained.
he stateful setting enable us to initialize the hidden states of the next batch with the hidden states of the last batch, but this is still somehow "random" because you have no reason to believe that the hidden states of the last batch have been well trained.
No longer true after a few epochs.
he stateful setting enable us to initialize the hidden states of the next batch with the hidden states of the last batch, but this is still somehow "random" because you have no reason to believe that the hidden states of the last batch have been well trained.
No longer true after a few epochs.
Yes, so I said "somehow", just my guess for the intention of @bstriner's original comments.
@Elch123 Have you tested that? It apparently doesn't work on my end.
Basically I have a sequential model trained stateless, and the first layer of the model is a stacked lstm. I load the trained model, edit the config with
'stateful'=True
and setbatch_input_shape
, set the new model with same wights and modified config, while the testing result doesn't change.old_network = keras.models.load_model(args.model_path, custom_objects=None, compile=False) config = old_network.get_config() config['layers'][0]['config']['stateful'] = True config['layers'][0]['config']['batch_input_shape'] = (1, None, 6) weights = old_network.get_weights() network = tf.keras.models.Sequential.from_config(config) network.set_weights(weights)
I trained my model with data sequence with same length of timestamps, which is independent to each other. While I want to achieve real-time prediction, where the input data has only one timestamps, and I want to make the next prediction to base on the former state.
Does anyone have some idea how to do that?
---------------------------------------update---------------------------------------------
It seems that both stateful and stateless update the state in sameway during model.predict() , is that a expected behaviour?
@VertexC I am trying to do something similar and I experienced the same behaviour in keras. Did you find a proper way to do it?
Most helpful comment
ahrnbom, Using model.get_weights() and model.set_weights(), the weights of a normal LSTM can be transferred to a stateful LSTM, assuming the architectures are otherwise identical.