Keras: call_back_error when using multi_gpu_model

Created on 17 May 2018  路  3Comments  路  Source: keras-team/keras

when I use multi_gpu_model ,I can train it on the first epoch
but when it comes to the 2nd epoch , there are errors about save_model and deepcopy on callbacks
could anyone help me ?
FYI :when I don't use multi_gpu_model ,it doesn't exist

keras :2.1.0
tensorflow 1.4.0
ubuntu16.04

my callback function is

checkpoint = ModelCheckpoint(weight_path,
                               monitor='val_acc',
                               verbose=1,
                               save_best_only=True, mode='max')
callbacks_list = [checkpoint]

parallel_model.fit_generator(nturgbd_train_datagen(augment),
                      steps_per_epoch=num_training_samples/batch_size+1,  
                      epochs=epochs,
                      verbose=1,
                      callbacks=callbacks_list,
                      validation_data=nturgbd_test_datagen(),
                      validation_steps=samples_per_validation/batch_size+1,

                     )

here is the error

Epoch 00001: val_acc improved from -inf to 0.16544, saving model to weights/rot_lstm/cs/rot100/001_0.165.hdf5
Traceback (most recent call last):
  File "VA_train.py", line 460, in <module>
    train()
  File "VA_train.py", line 406, in train
    validation_steps=samples_per_validation/batch_size+1,
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/engine/training.py", line 2136, in fit_generator
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/callbacks.py", line 73, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/callbacks.py", line 414, in on_epoch_end
    self.model.save(filepath, overwrite=True)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/engine/topology.py", line 2556, in save
    save_model(self, filepath, overwrite, include_optimizer)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/models.py", line 108, in save_model
    'config': model.get_config()
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/site-packages/keras/engine/topology.py", line 2397, in get_config
    return copy.deepcopy(config)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 230, in _deepcopy_list
    y.append(deepcopy(a, memo))
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 237, in _deepcopy_tuple
    y.append(deepcopy(a, memo))
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 237, in _deepcopy_tuple
    y.append(deepcopy(a, memo))
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/lgy/anaconda2/envs/pycharm_tf/lib/python2.7/copy.py", line 182, in deepcopy
    rv = reductor(2)
TypeError: can't pickle NotImplementedType objects

Most helpful comment

https://keras.io/utils/#multi_gpu_model

Save model via the template model (which shares the same weights)

try this

class ParallelModelCheckpoint(ModelCheckpoint):
    def __init__(self,model,filepath, monitor='val_loss', verbose=0,
                 save_best_only=False, save_weights_only=False,
                 mode='auto', period=1):
        self.single_model = model
        super(ParallelModelCheckpoint,self).__init__(filepath, monitor, verbose,save_best_only, save_weights_only,mode, period)

    def set_model(self, model):
        super(ParallelModelCheckpoint,self).set_model(self.single_model)

use like this
check_point = ParallelModelCheckpoint(single_model ,'best.hd5')

All 3 comments

https://keras.io/utils/#multi_gpu_model

Save model via the template model (which shares the same weights)

try this

class ParallelModelCheckpoint(ModelCheckpoint):
    def __init__(self,model,filepath, monitor='val_loss', verbose=0,
                 save_best_only=False, save_weights_only=False,
                 mode='auto', period=1):
        self.single_model = model
        super(ParallelModelCheckpoint,self).__init__(filepath, monitor, verbose,save_best_only, save_weights_only,mode, period)

    def set_model(self, model):
        super(ParallelModelCheckpoint,self).set_model(self.single_model)

use like this
check_point = ParallelModelCheckpoint(single_model ,'best.hd5')

@birolkuyumcu thank you! actually problem was solved when I set "save_weights_only=True" **

do you try to load saved weights ?
i think it doesnt work

Was this page helpful?
0 / 5 - 0 ratings