I tried to save and restore state of my training process using model.save() and keras.models.load_model(). Although weights, optimizer state and learning rate is restored correctly state of callback functions like keras.callbacks.ReduceLROnPlateau() doesn't get it's state restored. Neither does the epoch number gets restored.
So I am essentially looking for a checkpoint mechanism that restore the run state exactly.
This is necessary since many deep learning models keep training for very very long and for different reasons one has to be able to stop and resume training often even every few epoch. in that case restoring lr-schedular state is necessary as otherwise correct stepwise reduction is not possible.
You can pickle your Callback and reload it after.
I am getting an error like - pickle.PicklingError: Can't pickle <function <lambda> at 0x7f90e88fae60>: it's not found as keras.callbacks.<lambda>
Even with import dill, RuntimeError: maximum recursion depth exceeded this happens.
It's because you provide schedule as a Lambda (as stated in your error message)
def schedule(epoch):
return epoch
from keras import callbacks
lr = callbacks.LearningRateScheduler(schedule)
import pickle
pickle.dumps(lr)
pickle.loads(pickle.dumps(lr))
@Dref360 Any update here? It would be much easier if have an option to save and load callbacks with state with model.save and load_model respectively. However, we should also handle the scenario where someone wishes to add new callbacks and remove them.
@Dref360 Any update here? It would be much easier if have an option to save and load callbacks with state with
model.saveandload_modelrespectively. However, we should also handle the scenario where someone wishes to add new callbacks and remove them.
Even if it is for fixed callbacks with no scope of addition or removing would be nice feature
The Callbacks are not serializable in TensorFlow, but it definitely would be helpful if they would be.
All is needed is to implement get_config and from_config methods for the callbacks. There's an open PR that targets to solve this problem in a more generic manner: https://github.com/tensorflow/tensorflow/pull/36635
Most helpful comment
@Dref360 Any update here? It would be much easier if have an option to save and load callbacks with state with
model.saveandload_modelrespectively. However, we should also handle the scenario where someone wishes to add new callbacks and remove them.