keras.utils.multi-gpu-model(model,5) will not work well with ModelCheckpoint callback. It throws a "cannot serialize IO object error." I guess I understand why this might is happening since multiple copies of the same model span my gpus but I am not sure how to fix it.
Any workarounds? It works awesome otherwise.
EDIT: Closing this issue. Saving weights works just fine.
+1
@pGit1 Hi, did the ModelCheckpoint callback can be used to save the checkpoint of the multi-gpu model correctly now?
Yes it does. Refer to this: https://github.com/keras-team/keras/issues/2436#issuecomment-354882296
This should help you.
I solved the problem using the following way. I changed some lines in the major codes of keras (particularly in topology.py/network.py and callbacks.py). Here, I just modified the following codes.
Reminder: You need to replace 'saving.save_weights_to_hdf5_group' with 'save_weights_to_hdf5_group(f, layers)' if you use an older version of Keras.
network.py:
def save_weights(self, filepath, overwrite=True, multiple_gpu=False, name_of_model=""):
"""Dumps all layer weights to a HDF5 file.
name_of_model is usually model_1, you can check the name of the model by calling summary after running multi_gpu_model
The weight file has:
- `layer_names` (attribute), a list of strings
(ordered names of model layers).
- For every layer, a `group` named `layer.name`
- For every such layer group, a group attribute `weight_names`,
a list of strings
(ordered names of weights tensor of the layer).
- For every weight in the layer, a dataset
storing the weight value, named after the weight tensor.
# Arguments
filepath: String, path to the file to save the weights to.
overwrite: Whether to silently overwrite any existing file at the
target location, or provide the user with a manual prompt.
# Raises
ImportError: If h5py is not available.
"""
if h5py is None:
raise ImportError('`save_weights` requires h5py.')
# If file exists and should not be overwritten:
if not overwrite and os.path.isfile(filepath):
proceed = ask_to_proceed_with_overwrite(filepath)
if not proceed:
return
with h5py.File(filepath, 'w') as f:
if multiple_gpu and name_of_model is not None:
layers = self.get_layer(name_of_model)
layers = layers.layers
saving.save_weights_to_hdf5_group(f, layers)
else:
saving.save_weights_to_hdf5_group(f, self.layers)
f.flush()
callback.py:
class ModelCheckpoint(Callback):
"""Save the model after every epoch.
`filepath` can contain named formatting options,
which will be filled with the values of `epoch` and
keys in `logs` (passed in `on_epoch_end`).
For example: if `filepath` is `weights.{epoch:02d}-{val_loss:.2f}.hdf5`,
then the model checkpoints will be saved with the epoch number and
the validation loss in the filename.
# Arguments
filepath: string, path to save the model file.
monitor: quantity to monitor.
verbose: verbosity mode, 0 or 1.
save_best_only: if `save_best_only=True`,
the latest best model according to
the quantity monitored will not be overwritten.
mode: one of {auto, min, max}.
If `save_best_only=True`, the decision
to overwrite the current save file is made
based on either the maximization or the
minimization of the monitored quantity. For `val_acc`,
this should be `max`, for `val_loss` this should
be `min`, etc. In `auto` mode, the direction is
automatically inferred from the name of the monitored quantity.
save_weights_only: if True, then only the model's weights will be
saved (`model.save_weights(filepath)`), else the full model
is saved (`model.save(filepath)`).
period: Interval (number of epochs) between checkpoints.
"""
def __init__(self, filepath, monitor='val_loss', verbose=0,
save_best_only=False, save_weights_only=False,
mode='auto', period=1,
multiple_gpu=False, name_of_model=None):
super(ModelCheckpoint, self).__init__()
self.monitor = monitor
self.verbose = verbose
self.filepath = filepath
self.save_best_only = save_best_only
self.save_weights_only = save_weights_only
self.period = period
self.epochs_since_last_save = 0
self.multi_gpu_mode = multiple_gpu
self.name_of_model = name_of_model
if mode not in ['auto', 'min', 'max']:
warnings.warn('ModelCheckpoint mode %s is unknown, '
'fallback to auto mode.' % (mode),
RuntimeWarning)
mode = 'auto'
if mode == 'min':
self.monitor_op = np.less
self.best = np.Inf
elif mode == 'max':
self.monitor_op = np.greater
self.best = -np.Inf
else:
if 'acc' in self.monitor or self.monitor.startswith('fmeasure'):
self.monitor_op = np.greater
self.best = -np.Inf
else:
self.monitor_op = np.less
self.best = np.Inf
def on_epoch_end(self, epoch, logs=None):
logs = logs or {}
self.epochs_since_last_save += 1
if self.epochs_since_last_save >= self.period:
self.epochs_since_last_save = 0
filepath = self.filepath.format(epoch=epoch + 1, **logs)
if self.save_best_only:
current = logs.get(self.monitor)
if current is None:
warnings.warn('Can save best model only with %s available, '
'skipping.' % (self.monitor), RuntimeWarning)
else:
if self.monitor_op(current, self.best):
if self.verbose > 0:
print('\nEpoch %05d: %s improved from %0.5f to %0.5f,'
' saving model to %s'
% (epoch + 1, self.monitor, self.best,
current, filepath))
self.best = current
if self.save_weights_only:
self.model.save_weights(filepath, overwrite=True, multiple_gpu=self.multi_gpu_mode, name_of_model=self.name_of_model)
else:
self.model.save(filepath, overwrite=True)
else:
if self.verbose > 0:
print('\nEpoch %05d: %s did not improve from %0.5f' %
(epoch + 1, self.monitor, self.best))
else:
if self.verbose > 0:
print('\nEpoch %05d: saving model to %s' % (epoch + 1, filepath))
if self.save_weights_only:
self.model.save_weights(filepath, overwrite=True)
else:
self.model.save(filepath, overwrite=True)
I Solved this problem using following updates..!
we need to use the multi-GPU model on our other callbacks for performance reasons, but we also need the template model for ModelCheckpoint and some other callbacks. For that reason, we made a tiny adapter called AltModelCheckpoint to wrap ModelCheckpoint with the checkpointed model being explicitly specified.
Installation is easy
from alt_model_checkpoint import AltModelCheckpoint
from keras.models import Model
from keras.utils import multi_gpu_model
base_model = Model(...)
gpu_model = multi_gpu_model(base_model)
gpu_model.compile(...)
gpu_model.fit(..., callbacks=
AltModelCheckpoint('save/path/for/model.hdf5',
base_model)
])
Enjoy.....! :)
Most helpful comment
I Solved this problem using following updates..!
we need to use the multi-GPU model on our other callbacks for performance reasons, but we also need the template model for ModelCheckpoint and some other callbacks. For that reason, we made a tiny adapter called AltModelCheckpoint to wrap ModelCheckpoint with the checkpointed model being explicitly specified.
Installation is easy
pip install alt-model-checkpoint
from alt_model_checkpoint import AltModelCheckpoint
from keras.models import Model
from keras.utils import multi_gpu_model
base_model = Model(...)
gpu_model = multi_gpu_model(base_model)
gpu_model.compile(...)
gpu_model.fit(..., callbacks=
AltModelCheckpoint('save/path/for/model.hdf5',
base_model)
])
Enjoy.....! :)