Please make sure that the boxes below are checked before you submit your issue. If your issue is an implementation question, please ask your question on StackOverflow or join the Keras Slack channel and ask there instead of filing a GitHub issue.
Thank you!
[X ] Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/keras-team/keras.git --upgrade --no-deps
[ X] If running on TensorFlow, check that you are up-to-date with the latest version. The installation instructions can be found here.
[ ] If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
[ X] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).
https://github.com/keras-team/keras/blob/master/examples/lstm_seq2seq.py
Basic Issue:
I'm running on TensorFlow
In the Keras example linked above, the file lstm_seq2seq.py generates an error.
line 153: model.save('s2s.h5') returns
2379: UserWarning: Layer lstm_2 was passed non-serializable keyword arguments: {'initial_state': [<tf.Tensor 'lstm_1/while/Exit_2:0' shape=(?, 256) dtype=float32>, <tf.Tensor 'lstm_1/while/Exit_3:0' shape=(?, 256) dtype=float32>]}. They will not be included in the serialized model (and thus will be missing at deserialization time).
str(node.arguments) + '. They will not be included '
Although this is phrased as a warning not an error , the result seems to be that the saved model is missing required information.
I've successfully saved other models in the past - so its something specific to this model.
Other information
I've tried breaking up the model and saving the weights and config separately (see below) but model.get_weights() returns the same error.
# alternative method to save model by breaking it up into weights and config
import os
import pickle
def save_model(model, MODEL_DIR):
if not os.path.isdir(MODEL_DIR):
os.makedirs(MODEL_DIR)
weights = model.get_weights()
with open(os.path.join(MODEL_DIR ,'model'),'wb') as file_:
pickle.dump(weights[1:], file_)
with open(os.path.join(MODEL_DIR, 'config.json'),'w') as file_:
file_.write(model.to_json())
save_model(model,'model_dir')
I tried to look into how model.get_weights() is implemented. Its just a loop that calls layer.get_weights() for each layer of the model.
I am also getting the same "warning". Any solution to this problem yet?
Also having this issue. It being a warning is definitely an understatement, after saving and loading my model the performance is way off. I believe this is related (if not identical) to issue #8428. It is also referenced in several Google Group discussions, eg here. @fchollet mentions there the model should be saved properly, but I'm quite sure it's not.
I'm also having the same issue. BTW how would I save the encoder states?? Any idea folks
I am also getting the same "warning". Any solution to this problem yet?
@martvl When you load the saved model, are you also redefining encoder_model and decoder_model? If, when you say the performance is way off, you mean that the decoded sentences look completely different to how they did when you first trained this model, then I had the same concern, but found that redefining encoder_model and decoder_model according to the git code fixed the issue and recovered the behaviour of the original model. I hope that helps!
As for the warnings, I was under the impression they were referring to the fact that the outputs of the encoder are being discarded during training. If that's the case, then it should be nothing to worry about (although I would like to know how to stop the warning from showing). I'm not completely certain so someone please feel free to confirm/correct.
@chrispyT I tried reloading the model using keras.models.load_model() from the .HDF5 file that produced the "warning", that indeed resulted in completely different sentences than the original model was producing. It might be that redefining the model with the correct layers and then loading the weights fixes the issue. However, I'm currently not using this model anymore for the project I was working on, so I won't be trying this solution anytime soon. Maybe someone else can check if that fixes the issue?
@chrispyT I have tried your approach and unfortunately it does not fix the issue. Here was my process:
@microdave There are two versions of the encoder/decoder constructor; the one at
https://github.com/keras-team/keras/blob/master/examples/lstm_seq2seq.py
(as linked to by OP) only works if you have just trained the model, because it relies on already having encoder_inputs and encoder_states defined when it assigns:
encoder_model = Model(encoder_inputs, encoder_states)
It has these because encoder_inputs and encoder_states are defined during model setup. The other version at
https://github.com/keras-team/keras/blob/master/examples/lstm_seq2seq_restore.py
is needed if you are reloading the model: it dissects the layers of the loaded model and picks out the bits it needs to reconstruct everything. E.g. it precedes the above line with
encoder_inputs = model.input[0] # input_1
encoder_outputs, state_h_enc, state_c_enc = model.layers[2].output # lstm_1
encoder_states = [state_h_enc, state_c_enc]
I had the same experience as you until I realised this second version was needed, so hopefully this will fix your issue.
Has this been solved in some way? @chrispyT's suggestion does not fix the problem, since the internal states are not saved and restored in the model. Saving/Loading changes the predictions (only if you restart python/kernel/notebook... it works fine as long as the original model is still in memory, for some reason) and the loss is as bad as it is initially (in my case). This means that currently it is not possible in Keras to do seq2seq with storing/loading the model inbetween training runs or am I missing something?
@da-bu Have you tried the solution(?) from my most recent comment on Aug 8? I believe it addresses the saving and reloading of internal states weights. It is (or at least was) definitely possible to save and reload a seq2seq model in Keras (edit: I realise now this refers to reloading for prediction only).
@chrispyT Wow thank you for your super-fast reply! :) Yes I've tried the code you've linked to. This works fine for reconstructing the model as long as saving and loading happens in the same python session. However, if I save the model, restart my python notebook, and then load the model again it 1) yields different predictions, and 2) seems to be "untrained" again as it has a high loss.
So training, saving, loading, training some more is not possible at the moment. I think the issue is that it only saves the weights but not the internal states if I understand this correctly. I also get the user warning that the original poster had (with the latest versions of keras and tensorflow).
Edit: The seq2seq_restory code (https://github.com/keras-team/keras/blob/master/examples/lstm_seq2seq_restore.py) only addresses restoring the models for prediction (and that doesn't work across python sessions it seems) - but I am also interested in continuing the training of the model.
Daniel, I was unable to fix this issue as well. As you indicated, it saves
the weights but not the internal states.
On Wed, Feb 13, 2019 at 10:01 AM Daniel notifications@github.com wrote:
@chrispyT https://github.com/chrispyT Wow thank you for your super-fast
reply! :) Yes I've tried the code you've linked to. This works fine for
reconstructing the model as long as saving and loading happens in the same
python session. However, if I save the model, restart my python notebook,
and then load the model again it 1) yields different predictions, and 2)
seems to be "untrained" again as it has a high loss. So training, saving,
loading, training some more is not possible at the moment. I think the
issue is that it only saves the weights but not the internal states if I
understand this correctly. I also get the user warning that the original
poster had (I have latest version of keras and tensorflow).—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/keras-team/keras/issues/9914#issuecomment-463301785,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHXGg7spdn08B2vpBl6yEbXreBOu2L83ks5vNFLngaJpZM4TRHJ5
.
@KingDavidOfTepper: Thank you for the reply! Have you found a workaround for your case?
Also, should we open a new issue for this? Or what would be a way to bring attention to this? Given that it's an issue with a main model architecture, which is even part of the official keras examples, I think this might deserve a fix or more "official" workaround suggestions.
@da-bu Sorry, I misunderstood, so I retract my statement about it being 'definitely' possible! I'm afraid I didn't encounter this problem myself as I wasn't retraining my models.
So, is there any solution to this problem yet? Anyone? Any idea @fchollet ?
For info: this is on Keras v2.2.4, installed using pip inside an Anaconda virtual env.
Thank you all for the input and confirmations. There definitely is something that is not saved and restored as expected, but I wonder if my understanding is correct. Shouldn't the values of the internal states be a result of the LSTM processing the input sequence? i.e. they are not something that needs to be stored after all? (see the stateful attribute - I use stateful=False, e.g. see http://philipperemy.github.io/keras-stateful-lstm/). But if so, what is the thing that isn't properly stored and retrieved? A weight matrix?
I have the same problem, do someone found the solution?
The model is correctly saved with this warning when I load the model the prediction is not the same.
I haven't found a solution unfortunately. Well, in a way - I switched to using pytorch instead...
I look forward to this problem being resolved soon.
Will this be easier to track/fix with the changes to SavedModel in TensorFlow 2.0 ?
So, is there any solution to this problem yet?
Got the same problem. Any solution??
Having the same problem - there was a suggestion elsewhere that using a checkpoint and reloading from the checkpoint would solve the problem, but it doesn't :(
can anyone advise on how to I can install this francoisdelarbre fix into my colab notebook?
Could we just save the states after training with open('seq2seq.states', 'wb').write(states) ???
I tried to reach out to @fchollet on twitter with this issue. Maybe he will respond: https://twitter.com/em_aryan/status/1136256827996327937
@nixon-nyx Did you try to save and reload the states? Did it work and if it did can you please share the code?
Yes. The code is simply on the keras seq2seq demo.
On Wed, 5 Jun 2019 at 21:07, Aryan Singh notifications@github.com wrote:
@nixon-nyx https://github.com/nixon-nyx Did you try to save and reload
the states? Did it work and if it did can you please share the code?—
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/keras-team/keras/issues/9914?email_source=notifications&email_token=AKNQXZSTZQNOQB5EKPERAHLPY63BHA5CNFSM4E2EOJ42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW7UMTY#issuecomment-499074639,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKNQXZUQZRQIDLOG2DZ4TITPY63BHANCNFSM4E2EOJ4Q
.
It even work fine for bidirectional seq2seq model.
On Wed, 5 Jun 2019 at 21:08, Nyx AIoT nyxiot@gmail.com wrote:
Yes. The code is simple on the keras seq2seq demo.
On Wed, 5 Jun 2019 at 21:07, Aryan Singh notifications@github.com wrote:
@nixon-nyx https://github.com/nixon-nyx Did you try to save and reload
the states? Did it work and if it did can you please share the code?—
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/keras-team/keras/issues/9914?email_source=notifications&email_token=AKNQXZSTZQNOQB5EKPERAHLPY63BHA5CNFSM4E2EOJ42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW7UMTY#issuecomment-499074639,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKNQXZUQZRQIDLOG2DZ4TITPY63BHANCNFSM4E2EOJ4Q
.
@nixon-nyx I got the training, sample code to work and saved the model but how to save and reload the states is what i am stuck at, because without encoder states the saved model is no good. Can you please help me there?
@aryancodify Sure.
from __future__ import print_function
from sklearn.model_selection import train_test_split
from keras.models import Model, load_model
from keras.layers import Dense, Input, Embedding, Bidirectional, Concatenate, LSTM, CuDNNLSTM
from keras.preprocessing.sequence import pad_sequences
from keras.callbacks import TensorBoard, Callback#, ModelCheckpoint
from keras.optimizers import RMSprop
import keras.backend as K
# Seq2Seq model
latent_dim = 256
enc_input = Input(shape=(None, ), name='enc_input')
enc_emb = Embedding(input_dim=self.p_vocab_size, output_dim=128, input_length=self.maxlen, name='enc_emb')
enc_lstm = Bidirectional(CuDNNLSTM(units=latent_dim, return_state=True, name='enc_lstm'))
enc_out, forward_h, forward_c, back_h, back_c = enc_lstm(enc_emb(enc_input))
enc_states = [Concatenate()([forward_h, back_h]), Concatenate()([forward_c, back_c])]
dec_input = Input(shape=(None, self.c_vocab_size), name='dec_input')
dec_lstm = CuDNNLSTM(units=latent_dim*2, return_sequences=True, return_state=True, name='dec_lstm')
dec_out, _, _ = dec_lstm(dec_input, initial_state=enc_states)
dec_dense = Dense(units=self.c_vocab_size, activation='softmax', name='dec_dense')
dec_target = dec_dense(dec_out)
model = Model([enc_input, dec_input], dec_target)
model.compile(loss='categorical_crossentropy', optimizer=self.optimizer)
model.summary()
'''
use this only if you want to train on a big dataset as generator
enable you to go through your dataset in small batches
'''
model.fit_generator(generator=train_gen,
steps_per_epoch=train_batches,
epochs=self.epochs,
verbose=1,
validation_data=test_gen,
validation_steps=test_batches,
callbacks=[checkpoint, tensorboard])
model.save_weights(self.weight_filepath)
you will need to change the vocab size to yours and the maxlen of a sentence to yours
For inference, you simply need to build the model once again and load the weights using model.load_weights(weight_filepath)
# Seq2Seq model
latent_dim = 256
enc_input = Input(shape=(None, ), name='enc_input')
enc_emb = Embedding(input_dim=self.p_vocab_size, output_dim=128, input_length=self.maxlen, name='enc_emb')
enc_lstm = Bidirectional(CuDNNLSTM(units=latent_dim, return_state=True, name='enc_lstm'))
enc_out, forward_h, forward_c, back_h, back_c = enc_lstm(enc_emb(enc_input))
enc_states = [Concatenate()([forward_h, back_h]), Concatenate()([forward_c, back_c])]
dec_input = Input(shape=(None, self.c_vocab_size), name='dec_input')
dec_lstm = CuDNNLSTM(units=latent_dim*2, return_sequences=True, return_state=True, name='dec_lstm')
dec_out, _, _ = dec_lstm(dec_input, initial_state=enc_states)
dec_dense = Dense(units=self.c_vocab_size, activation='softmax', name='dec_dense')
dec_target = dec_dense(dec_out)
model = Model([enc_input, dec_input], dec_target)
model.compile(loss='categorical_crossentropy', optimizer=self.optimizer)
model.load_weights(inf_weight_filepath)
# inference
enc_model = Model(enc_input, enc_states)
dec_state_in = [Input(shape=(latent_dim*2,)), Input(shape=(latent_dim*2,))]
inf_dec_out, h, c = dec_lstm(dec_input, initial_state=dec_state_in)
dec_states = [h, c]
dec_out = dec_dense(inf_dec_out)
dec_model = Model([dec_input] + dec_state_in, [dec_out] + dec_states)
still no solution for this problem ?
Most helpful comment
@microdave There are two versions of the encoder/decoder constructor; the one at
https://github.com/keras-team/keras/blob/master/examples/lstm_seq2seq.py
(as linked to by OP) only works if you have just trained the model, because it relies on already having
encoder_inputsandencoder_statesdefined when it assigns:encoder_model = Model(encoder_inputs, encoder_states)It has these because
encoder_inputsandencoder_statesare defined during model setup. The other version athttps://github.com/keras-team/keras/blob/master/examples/lstm_seq2seq_restore.py
is needed if you are reloading the model: it dissects the layers of the loaded model and picks out the bits it needs to reconstruct everything. E.g. it precedes the above line with
I had the same experience as you until I realised this second version was needed, so hopefully this will fix your issue.