Mask_rcnn: Training error - Failed to run optimizer, stage RemoveStackStridedSliceSameAxis

Created on 31 Jan 2019  路  13Comments  路  Source: matterport/Mask_RCNN

Hi, I'm trying to train the model with dataset coco 2017, but it reports error as following. Does anyone have the same problem? How to fix it? thanks!
I'm using Ubuntu 16.04 64bit, Python 3.6.7, pip 18.1, tensorflow_gpu 1.13.0-rc0, keras 2.2.4, cuda 10.0.130, libcudnn7-dev_7.4.2.24-1, libcudnn7_7.4.2.24-1

2019-01-31 10:11:33.060384: W ./tensorflow/core/grappler/optimizers/graph_optimizer_stage.h:241] Failed to run optimizer ArithmeticOptimizer, stage RemoveStackStridedSliceSameAxis node proposal_targets/strided_slice. Error: ValidateStridedSliceOp returned partial shapes [1,?,?] and [?,?]
2019-01-31 10:11:33.060528: W ./tensorflow/core/grappler/optimizers/graph_optimizer_stage.h:241] Failed to run optimizer ArithmeticOptimizer, stage RemoveStackStridedSliceSameAxis node proposal_targets/strided_slice_37. Error: ValidateStridedSliceOp returned partial shapes [1,?,?] and [?,?]
2019-01-31 10:11:43.572615: W ./tensorflow/core/grappler/optimizers/graph_optimizer_stage.h:241] Failed to run optimizer ArithmeticOptimizer, stage RemoveStackStridedSliceSameAxis node proposal_targets/strided_slice. Error: ValidateStridedSliceOp returned partial shapes [1,?,?] and [?,?]
2019-01-31 10:11:43.572742: W ./tensorflow/core/grappler/optimizers/graph_optimizer_stage.h:241] Failed to run optimizer ArithmeticOptimizer, stage RemoveStackStridedSliceSameAxis node proposal_targets/strided_slice_37. Error: ValidateStridedSliceOp returned partial shapes [1,?,?] and [?,?]
999/1000 [============================>.] - ETA: 0s - loss: 0.3451 - rpn_class_loss: 0.0034 - rpn_bbox_loss: 0.0729 - mrcnn_class_loss: 0.0632 - mrcnn_bbox_loss: 0.0454 - mrcnn_mask_loss: 0.16012019-01-31 10:24:40.241827: W ./tensorflow/core/grappler/optimizers/graph_optimizer_stage.h:241] Failed to run optimizer ArithmeticOptimizer, stage RemoveStackStridedSliceSameAxis node proposal_targets/strided_slice. Error: ValidateStridedSliceOp returned partial shapes [1,?,?] and [?,?]
2019-01-31 10:24:40.241952: W ./tensorflow/core/grappler/optimizers/graph_optimizer_stage.h:241] Failed to run optimizer ArithmeticOptimizer, stage RemoveStackStridedSliceSameAxis node proposal_targets/strided_slice_37. Error: ValidateStridedSliceOp returned partial shapes [1,?,?] and [?,?]
2019-01-31 10:24:41.861193: W ./tensorflow/core/grappler/optimizers/graph_optimizer_stage.h:241] Failed to run optimizer ArithmeticOptimizer, stage RemoveStackStridedSliceSameAxis node proposal_targets/strided_slice. Error: ValidateStridedSliceOp returned partial shapes [1,?,?] and [?,?]
1000/1000 [==============================] - 880s 880ms/step - loss: 0.3449 - rpn_class_loss: 0.0034 - rpn_bbox_loss: 0.0729 - mrcnn_class_loss: 0.0631 - mrcnn_bbox_loss: 0.0454 - mrcnn_mask_loss: 0.1600 - val_loss: 2.0419 - val_rpn_class_loss: 0.1070 - val_rpn_bbox_loss: 0.9081 - val_mrcnn_class_loss: 0.4638 - val_mrcnn_bbox_loss: 0.2041 - val_mrcnn_mask_loss: 0.3590
Epoch 6/12
1000/1000 [==============================] - 799s 799ms/step - loss: 0.3522 - rpn_class_loss: 0.0037 - rpn_bbox_loss: 0.0779 - mrcnn_class_loss: 0.0551 - mrcnn_bbox_loss: 0.0508 - mrcnn_mask_loss: 0.1647 - val_loss: 1.5071 - val_rpn_class_loss: 0.0384 - val_rpn_bbox_loss: 0.8576 - val_mrcnn_class_loss: 0.1833 - val_mrcnn_bbox_loss: 0.1751 - val_mrcnn_mask_loss: 0.2529

Most helpful comment

Hello,I meet the same problem? Could you tell me the anwser?

Hi, actually I haven't found the solution, but the model can be trained even these errors occurred. Hope for other answers.

All 13 comments

Hello,I meet the same problem? Could you tell me the anwser?

Hello,I meet the same problem? Could you tell me the anwser?

Hi, actually I haven't found the solution, but the model can be trained even these errors occurred. Hope for other answers.

Thanks for your answer锛両 am in the same situation now,the model can be trained even these errors occurred.I hope solve the problem with you !

Hello, Im having the same issue and at the end it doesn't create mask_rcnn_bottle_{epoch:04d}.h5 file. After that nothing works well. Can anybody help?
Thanks in Advance!

I encountered the same behaviour (this warning, but could train). In my case, the problem was a version mismatch between tensorflow and tensorflow-gpu. Both were originally on version 1.12, but installing tensorboard through pip installed version 1.13 of tb and tf, but not tf-gpu.

Removing all versions and then installing 1.12.0 of all packages solved it for me.

I solve my issue till it generates .h5 files. It was a problem with GPU capacity(I think). Currently, I'm using google colab gpu runtime.

@mihiri91 Wonder how did you solve your issue? What do you mean by "a problem with GPU capacity"? Not enough memory? I am using Nvidia docker and getting the same issue.

I'm having the same problem. I'm using tensorflow==1.13 and cuda==10
I didn't have this issue when I'm using cuda 9.0. is it because of that?

Hi all.
I have the same issue if I create too big batch for training - GPU can't get it in memory. When I reduce batch size - there is no such error!
Right now I'm training net to watch does it create .h5 files.

For me, reducing the batch size (in particular reducing IMAGES_PER_GPU from 5 to 1, but keeping the number of GPUs at 8 was sufficient) worked too. Even with 2 images per GPU I got the error even though the GPUs have 32 GB VRAM each

I changed my tensorflow-gpu version from 1.14.0 to 1.13.1, this issue happened, so I just install it back with pip install tensorflow-gpu==1.14.0, and the issue disappeared.

I changed my tensorflow-gpu version from 1.14.0 to 1.13.1, this issue happened, so I just install it back with pip install tensorflow-gpu==1.14.0, and the issue disappeared.

This solved the issue.

I changed
tf.stack([...], axis=0)
to
tf.concat([ a1[tf.newaxis, ...], a2[tf.newaxis, ...], ...], axis=0),
then it worked.

Was this page helpful?
0 / 5 - 0 ratings