Mask_rcnn: Can't save model after training

Created on 10 Sep 2018 · 14Comments · Source: matterport/Mask_RCNN

Hi, I have encounter one problem. I tried to use 2 gpu to train the network from the beginning. I'm using the coco 2017 datasets which has 117266 training images and 4952 validation images. After having completed one training epoch, it couldn't save the model and outputted

100/100 [==============================] - 61s 606ms/step - loss: 3.8918 - rpn_class_loss: 0.0513 - rpn_bbox_loss: 0.6449 - mrcnn_class_loss: 1.3554 - mrcnn_bbox_loss: 0.9415 - mrcnn_mask_loss: 0.8987 - val_loss: 3.2013 - val_rpn_class_loss: 0.0345 - val_rpn_bbox_loss: 0.5427 - val_mrcnn_class_loss: 0.9298 - val_mrcnn_bbox_loss: 0.9336 - val_mrcnn_mask_loss: 0.7607

Epoch 00001: saving model to /mnt/3d48e0b9-40a9-42ab-95a8-44a0a6d88180/home/terry/Mask-RCNN/Mask_RCNN/logs/coco20180910T1646/mask_rcnn_coco_0001.h5
Traceback (most recent call last):
File "shibaba.py", line 70, in
layers="all")
File "/mnt/3d48e0b9-40a9-42ab-95a8-44a0a6d88180/home/terry/Mask-RCNN/Mask_RCNN/mrcnn/model.py", line 2387, in train
use_multiprocessing=False,
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(args, *kwargs)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/site-packages/keras/engine/training.py", line 1415, in fit_generator
initial_epoch=initial_epoch)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/site-packages/keras/engine/training_generator.py", line 247, in fit_generator
callbacks.on_epoch_end(epoch, epoch_logs)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/site-packages/keras/callbacks.py", line 77, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/site-packages/keras/callbacks.py", line 455, in on_epoch_end
self.model.save(filepath, overwrite=True)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/site-packages/keras/engine/network.py", line 1085, in save
save_model(self, filepath, overwrite, include_optimizer)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/site-packages/keras/engine/saving.py", line 116, in save_model
'config': model.get_config()
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/site-packages/keras/engine/network.py", line 926, in get_config
return copy.deepcopy(config)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 155, in deepcopy
y = copier(x, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 243, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 155, in deepcopy
y = copier(x, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 218, in _deepcopy_list
y.append(deepcopy(a, memo))
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 155, in deepcopy
y = copier(x, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 243, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 155, in deepcopy
y = copier(x, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 243, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 155, in deepcopy
y = copier(x, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 223, in _deepcopy_tuple
y = [deepcopy(a, memo) for a in x]
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 223, in
y = [deepcopy(a, memo) for a in x]
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 155, in deepcopy
y = copier(x, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 223, in _deepcopy_tuple
y = [deepcopy(a, memo) for a in x]
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 223, in
y = [deepcopy(a, memo) for a in x]
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 182, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 297, in _reconstruct
state = deepcopy(state, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 155, in deepcopy
y = copier(x, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 243, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 182, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 297, in _reconstruct
state = deepcopy(state, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 155, in deepcopy
y = copier(x, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 243, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 155, in deepcopy
y = copier(x, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 218, in _deepcopy_list
y.append(deepcopy(a, memo))
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 155, in deepcopy
y = copier(x, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 223, in _deepcopy_tuple
y = [deepcopy(a, memo) for a in x]
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 223, in
y = [deepcopy(a, memo) for a in x]
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 155, in deepcopy
y = copier(x, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 243, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 182, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 297, in _reconstruct
state = deepcopy(state, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 155, in deepcopy
y = copier(x, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 243, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 182, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/home/software/anaconda3/envs/carnd-term1/lib/python3.5/copy.py", line 306, in _reconstruct
y.__dict__.update(state)
AttributeError: 'NoneType' object has no attribute 'update'

Since I have modified the network by adding pooling layer in ROIAlign layer, I can't utilize the transfer learning method. The modified parts are:

Pool_size is doubled, for both function fpn_classifier_graph and build_fpn_mask_graph
x = PyramidROIAlign([2* pool_size, 2*pool_size], name="roi_align_classifier")([rois, image_meta] + feature_maps)
I add one pooling layer after bilinear interpolation in ROIAlign. for i in range(4):pooled[i] = tf.nn.max_pool(pooled[i], ksize=(1, 2, 2, 1), strides=(1, 2, 2, 1), padding="VALID", name="ROIPool")

My question is that WHAT SHOULD I DO TO SAVE MY MODEL AND ITS WEIGHTS?

versions:
keras 2.2.0
keras-base 2.2.0
python 3.5.2
tensorflow-gpu 1.9.0
CUDA 9.0
cuDNN 7.1.04

Source

zieitokk

Most helpful comment

@xDzai94 Yes. If you train the network from the scratch, you will be able to save the model.

Is there any way to save the complete model(not only weights) without training it from scratch?

Hi, you may try this https://github.com/matterport/Mask_RCNN/issues/1299#issuecomment-492968111.

Thank you for replying. But the problem still exists. I have also tried saving the model using:
KM.save_model(self.keras_model, filepath, overwrite=True, include_optimizer=True)

and it gives the same error since the:
keras.callbacks.ModelCheckpoint(self.checkpoint_path, monitor='val_loss', verbose=1, save_weights_only=False)
also saves with the same model.save method.

I have also tried saving weights and architecture separately using:

# Save JSON config to disk
        json_config = self.keras_model.to_json()
        with open('model_config.json', 'w') as json_file:
            json_file.write(json_config)

        # Save weights to disk
        print ('saving final weights file once again')
        self.keras_model.save_weights('new_final_weights.h5')

and still the error remains as it is.:
TypeError: can't pickle _thread.RLock objects

zakiahmedkgp on 12 Jun 2019

👍2 👀1

All 14 comments

Have you solved it?

angelbaowei on 19 Oct 2018

Have you solved it?

I have one workaround approach to this issue. Since I modified the model, I couldn't transfer learning from the pre-trained model. So I trained my model for 'all' layers.

model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE / 10, epochs=2, layers="all")

And also, I changed the save_weight_only flag into True in model.py.

keras.callbacks.ModelCheckpoint(self.checkpoint_path, monitor='val_loss', verbose=1, save_weights_only=True),

This could allow the model to save its weights. But as what I concerned, if I want to save the model, I should train my model from the scratch instead of using load_weights from coco or imagenet.

zieitokk on 22 Oct 2018

Anyone resolve this issue?
I am trying to same model with weight to deploy to Google cloud Platform.
Any help would be appreciated. Thanks.

yamuna83 on 2 Dec 2018

@zieitokk: do you resolve this problem? thanks

I stored only the weights instead of storing model and weights. You can check the reply above about what I have done to the network.

What comes to my mind is that there is already a keras.callback.Tensorboard in the code. It's write_graph parameter is settled as True. So you can monitor the graph by using Tensorboard.

And also, there is another way you may try. You can construct a list and store the structure in the list.
For example:

        sources = list()

        loc = list()

        conf = list()

        # apply vgg up to conv4_3 relu
        for k in range(23):
            x = self.base[k](x)

        s = self.Norm(x)
        sources.append(s)

        # apply vgg up to fc7
        for k in range(23, len(self.base)):
            x = self.base[k](x)

        # apply extra layers and cache source layer outputs
        for k, v in enumerate(self.extras):
            x = v(x)
            if k < self.indicator or k%2 ==0:
                sources.append(x)

        for (x, l, c) in zip(sources, self.loc, self.conf):
            loc.append(l(x).permute(0, 2, 3, 1).contiguous())
            conf.append(c(x).permute(0, 2, 3, 1).contiguous())

Where self.base[i] is pytorch-based layer. You can append the sequential layers into the list and print it after you compile your model.

Hope this will help you to solve the problem. Thank you.

zieitokk on 4 Dec 2018

Anyone resolve this issue?
I am trying to same model with weight to deploy to Google cloud Platform.
Any help would be appreciated. Thanks.

Please see the reply above. Thank you.

zieitokk on 4 Dec 2018

Have you solved it?

I have one workaround approach to this issue. Since I modified the model, I couldn't transfer learning from the pre-trained model. So I trained my model for 'all' layers.

model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE / 10, epochs=2, layers="all")

And also, I changed the save_weight_only flag into True in model.py.

keras.callbacks.ModelCheckpoint(self.checkpoint_path, monitor='val_loss', verbose=1, save_weights_only=True),

This could allow the model to save its weights. But as what I concerned, if I want to save the model, I should train my model from the scratch instead of using load_weights from coco or imagenet.

If I not mistaken what you mean, so I train the network without using weight from coco or image-net, it will able to save the model through the keras callback function ?

xDzai94 on 4 May 2019

@xDzai94 Yes. If you train the network from the scratch, you will be able to save the model.

zieitokk on 5 May 2019

Same case with me. I am trying to save the complete model by changing:
keras.callbacks.ModelCheckpoint(self.checkpoint_path, monitor='val_loss', verbose=1, save_weights_only=False),
but I keep getting the error:
TypeError: can't pickle _thread.RLock objects
I have tried searching for the error but unable to solve the problem.
I want my model to save not only the weights but the complete architecture of it since I want to convert my model to TFLite.
Thanks!

zakiahmedkgp on 11 Jun 2019

@xDzai94 Yes. If you train the network from the scratch, you will be able to save the model.

Is there any way to save the complete model(not only weights) without training it from scratch?

zakiahmedkgp on 11 Jun 2019

👀2 👍1

@xDzai94 Yes. If you train the network from the scratch, you will be able to save the model.

Is there any way to save the complete model(not only weights) without training it from scratch?

Hi, you may try this https://github.com/matterport/Mask_RCNN/issues/1299#issuecomment-492968111.

zieitokk on 12 Jun 2019