Keras: Convert LSTM model from stateless to stateful

Created on 11 Mar 2017  路  12Comments  路  Source: keras-team/keras

Hi,

I'm training an LSTM to tackle a many to many (one output for every input) problem using Keras with the Tensorflow backend.

My model is something like this:
model = Sequential() model.add(LSTM(seq_length, input_shape=(seq_length, feat_dim), input_dim=feat_dim)) model.compile(optimizer='rmsprop',loss='mse')

The length of my input sequences is variable and has a quite high variance (from very short sequences to nearly 1000 long sequences). At training time I can just divide the sequences in batches of fixed sizes but at test time it would be useful to feed the whole sequence and possibly get prediction at every time step.

Is it possible to train the model in stateless mode feeding fixed length sequences and then make predictions in a stateful fashion? Does this even make sense or am I missing something?

Thanks

stale

Most helpful comment

FYI:
I summarized my experiments on stateless training and stateful prediction:
Conclusions: The prediction does not always work well:
https://fairyonice.github.io/Understand-Keras's-RNN-behind-the-scenes-with-a-sin-wave-example.html

All 12 comments

It makes sense to me - I am thinking about trying the same thing this afternoon and will let you know how it goes. I think all it should entail is training as stateless, then creating a new stateful model and passing the weights manually:

for nb, layer in enumerate(model.layers):
    model2.layers[nb].set_weights(layer.get_weights())

Hi, I want to follow up on this topic.

I want to train LSTM model in the stateless setting for the same reason as @fedebecat.
During training, I want to backpropagate several steps to have good weights so I wnat timestep (the 1st position of the batch_input_shape) ~= 100. However, during the prediction I want to predict at every time step with stateful = True.

Would it be possible to train with stateless with:

batch_input_shape = [batch_size, timestep, n_feature], stateful=False

but predict with

batch_input_shape = [1, 1, n_feature], stateful=True ?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

FYI:
I summarized my experiments on stateless training and stateful prediction:
Conclusions: The prediction does not always work well:
https://fairyonice.github.io/Understand-Keras's-RNN-behind-the-scenes-with-a-sin-wave-example.html

Hi, I made a function to convert a model used during training to a model meant to be used in inference, receiving a single sample with any number of time steps (so a single time step sample can be used for real-time use). In case it helps:
Gist with the function

It basically copies the model, changes 'stateful' to True and the input shape to (1, None, original_value), and transfers the weights from the trained model.

PS: I didn't find a better solution to this problem, which I thought it would be a very common one. Please let me know any comments or solutions that I missed.

I have not yet tested this, but I think a possible workaround is to just set 'stateful' to True, and just call reset_states() after each batch during training.

@FairyOnIce thank you for sharing your insights with stateful/stateless. I just wanted to clarify from your blog posts, summarizing:

| Mode | Status |
|-------|--------|
| Training stateless / Prediction stateless | OK |
| Training stateless / Prediction stateful | OK |
| Training stateful / Prediction stateless | Unstable |
| Training stateful / Prediction stateful | Unstable |

Is this assessment correct?

I'm interested in training stateless / predicting stateful, so if I understood correctly, this would be fine, right?

@victorhcm How did you manage to do with Training stateless / Prediction stateful? I simply loaded the stateless model and set the stateful parameter of the RNN layer, but it apparently didn't do anything.

@VertexC as far as I understand this you can use a stateless model for training (training_model) and a separate model for prediction (prediction_model). After training is done, just copy over the weights: prediction_model.set_weights(training_model.get_weights()). According to @victorhcm's summary this should be ok.

I observed that a stateful training LSTM model could learn a sequence (natural language text encoded with word2vec), and an almost identical, but stateful prediction LSTM model could reproduce the sequence (after copying the weights; in fact not a very useful model because the text was just memorised). This was with batch_size = 1 during training and prediction.

For better computational efficiency, I would like to train stateless with batch_size > 1, and still predict on a stateful model with batch_size = 1 (i.e. one sample at a time). This works on very short sequences, but doesn't scale to long sequences. Any ideas on this?

@VertexC Hi, I had similar problem as you did. Have you found any solution to this? Thanks.

Hi, I'd appreciate if anyone was able to achieve a usable solution. @rpicatoste , I have tried to use your function but unfortunately it does not work as expected :/

Hi, I'd appreciate if anyone was able to achieve a usable solution. @rpicatoste , I have tried to use your function but unfortunately it does not work as expected :/

Hello, I wrote the function a long time ago, much before tensorflow 2, and it's likely that things have changed. Back in the day, it worked, but I have not used keras for almost 2 years.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

snakeztc picture snakeztc  路  3Comments

NancyZxll picture NancyZxll  路  3Comments

harishkrishnav picture harishkrishnav  路  3Comments

vinayakumarr picture vinayakumarr  路  3Comments

farizrahman4u picture farizrahman4u  路  3Comments