Models: train.py error: InvalidArgumentError (see above for traceback): assertion failed: [Groundtruth boxes and labels have incompatible shapes!] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/strided_slice_1:0) = ]

Created on 8 Nov 2017 · 19Comments · Source: tensorflow/models

Hello, I am trying to perform a training job with the help of the model (specified below) on my own dataset which contains only 2 classes.

System Info and things completed so far:

Ubuntu 16.04 (using python2.7)
models repo fetched from:- https://github.com/tensorflow/models/tree/0375c800c767db2ef070cee1529d8a50f42d1042
tf-nightly:- tf_nightly-1.5.0.dev20171107-cp27-cp27mu-manylinux1_x86_64.whl (md5)
I generated tfrecords using instructions provided in this repo:- https://github.com/balancap/SSD-Tensorflow/tree/master/ since create_pet_record_tf.py throws attribute errors.
followed "running locally" instructions and arranged label and tfrecord files accordingly.
Model used:- faster_rcnn_resnet101_coco
All my images are of high resolution i.e, 1080x1920, created bounding box classes with the help of labelImg tool available here:- https://github.com/tzutalin/labelImg, and each image has more than 2 instances of the classes. **for example, my xmls have two or more instances of <object> s'.
I am currently just attempting to do the training job on cpu since I do not have a gpu. Will switch to cloud machine if using cpu is the concern.

Error Description: (sorry, its a long post)

When I run "python train.py --logtostderr --train_dir=$TRAIN_DIR --pipeline_config_path=$PATH_TO_CONFIG, it gives the following errors:

INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
WARNING:tensorflow:From build/bdist.linux-x86_64/egg/object_detection/trainer.py:210: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:Scale of 0 disables regularizer.
WARNING:tensorflow:From build/bdist.linux-x86_64/egg/object_detection/meta_architectures/faster_rcnn_meta_arch.py:1670: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
INFO:tensorflow:Summary name /clone_loss is illegal; using clone_loss instead.
/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py:96: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2017-11-08 13:31:27.026402: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
INFO:tensorflow:Restoring parameters from /home/srinath/models/research/object_detection/models/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path object_detection/train/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Error reported to Coordinator: assertion failed: [Groundtruth boxes and labels have incompatible shapes!] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/strided_slice_1:0) = ] [0] [y (Loss/BoxClassifierLoss/strided_slice_2:0) = ] [2]
[[Node: Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/BoxClassifierLoss/assert_equal_1/All, Loss/RPNLoss/assert_equal_1/Assert/Assert/data_0, Loss/RPNLoss/assert_equal/Assert/Assert/data_1, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_2, Loss/BoxClassifierLoss/strided_slice_1, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_4, Loss/RPNLoss/strided_slice)]]

Caused by op u'Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert', defined at:
File "object_detection/train.py", line 165, in
tf.app.run()
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 161, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "build/bdist.linux-x86_64/egg/object_detection/trainer.py", line 228, in train
clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
File "build/bdist.linux-x86_64/egg/deployment/model_deploy.py", line 193, in create_clones
outputs = model_fn(args, *kwargs)
File "build/bdist.linux-x86_64/egg/object_detection/trainer.py", line 167, in _create_losses
losses_dict = detection_model.loss(prediction_dict)
File "build/bdist.linux-x86_64/egg/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 1305, in loss
groundtruth_masks_list,
File "build/bdist.linux-x86_64/egg/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 1463, in _loss_box_classifier
groundtruth_boxlists, groundtruth_classes_with_background_list)
File "build/bdist.linux-x86_64/egg/object_detection/core/target_assigner.py", line 444, in batch_assign_targets
anchors, gt_boxes, gt_class_targets)
File "build/bdist.linux-x86_64/egg/object_detection/core/target_assigner.py", line 149, in assign
message='Groundtruth boxes and labels have incompatible shapes!')
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/check_ops.py", line 324, in assert_equal
return control_flow_ops.Assert(condition, data, summarize=summarize)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/util/tf_should_use.py", line 112, in wrapped
return _add_should_use_warning(fn(args, *kwargs))
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 128, in Assert
condition, data, summarize, name="Assert")
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 47, in _assert
name=name)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3073, in create_op
op_def=op_def)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1524, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): assertion failed: [Groundtruth boxes and labels have incompatible shapes!] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/strided_slice_1:0) = ] [0] [y (Loss/BoxClassifierLoss/strided_slice_2:0) = ] [2]
[[Node: Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert = AssertT=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Traceback (most recent call last):
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 295, in stop_on_exception
yield
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 492, in run
self.run_loop()
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 1022, in run_loop
self._sv.global_step])
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
InvalidArgumentError: assertion failed: [Groundtruth boxes and labels have incompatible shapes!] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/strided_slice_1:0) = ] [0] [y (Loss/BoxClassifierLoss/strided_slice_2:0) = ] [2]
[[Node: Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/BoxClassifierLoss/assert_equal_1/All, Loss/RPNLoss/assert_equal_1/Assert/Assert/data_0, Loss/RPNLoss/assert_equal/Assert/Assert/data_1, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_2, Loss/BoxClassifierLoss/strided_slice_1, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_4, Loss/RPNLoss/strided_slice)]]

Traceback (most recent call last):
File "object_detection/train.py", line 165, in
tf.app.run()
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 161, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "build/bdist.linux-x86_64/egg/object_detection/trainer.py", line 332, in train
saver=saver)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 782, in train
ignore_live_threads=ignore_live_threads)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/contextlib.py", line 35, in __exit__
self.gen.throw(type, value, traceback)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 992, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 820, in stop
ignore_live_threads=ignore_live_threads)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 387, in join
six.reraise(*self._exc_info_to_raise)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 295, in stop_on_exception
yield
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 492, in run
self.run_loop()
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 1022, in run_loop
self._sv.global_step])
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [Groundtruth boxes and labels have incompatible shapes!] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/strided_slice_1:0) = ] [0] [y (Loss/BoxClassifierLoss/strided_slice_2:0) = ] [2]
[[Node: Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/BoxClassifierLoss/assert_equal_1/All, Loss/RPNLoss/assert_equal_1/Assert/Assert/data_0, Loss/RPNLoss/assert_equal/Assert/Assert/data_1, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_2, Loss/BoxClassifierLoss/strided_slice_1, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_4, Loss/RPNLoss/strided_slice)]]

I would be glad if someone provides a workaround for the above issue. Ready to provide further details about the errors above. Thanks in advance.

Source

sxr3455

Most helpful comment

EDIT:

in my case, this was causing the problem:

I had multiple objects/bounding boxes in the same example, but only one entry in the classes lists (when it should be one per box).

'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),

nhorro on 28 Dec 2017

👍2

All 19 comments

This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there. Thanks!

angerson on 8 Nov 2017

I am having the same problem and couldn't find the question in Stack Overflow. Did you find any solution for it ?

akhiljain100 on 1 Dec 2017

👍1

Yes. It worked when I tried with python 3 installation of tensorflow 1.2.

sxr3455 on 1 Dec 2017

No, this doesn't fix my problem. I think its related with the shapes of the bounding box and bounding box labels. Did you changed your training tfrecord format?

akhiljain100 on 1 Dec 2017

No, worked with the same old tf record files.

sxr3455 on 1 Dec 2017

I am having same problem. I don't think that it is problem with tensorflow version.

soumenms2015 on 2 Dec 2017

EDIT:

in my case, this was causing the problem:

I had multiple objects/bounding boxes in the same example, but only one entry in the classes lists (when it should be one per box).

'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),

nhorro on 28 Dec 2017

👍2

Check if the number of classes and labels count are equal in your generated tf record file.

abhiishekpal on 1 Jan 2018

Edit structure of tf.train.Example in code of creating tf records works for me.

artapova on 15 Jan 2018

@verabeldev : What modification did you do?? Could you please elaborate in details?

soumenms2015 on 18 Jan 2018

I met this error too. Any ideas to solve it ?

offbye on 8 Mar 2018

I had the same error as outlined above. my customization for a personal dataset was written in error as the tfrecord file.

It turns out there are multiple versions of dataset_util.py files in the tensorflow model repository.

I followed the explanation here for the object detection task and the major changes are as below.

from object_detection.utils import dataset_util

tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_image_data),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example

with the change above, I recreated the tfrecord files and was able to successfully train the model.

I hope this helps

Best
Aman

amanmeetgarg on 20 Mar 2018

👍1

@nhorro thanks for the info I get the same exact error. In my case for each image I have multiple bounding boxes but they all belong to the same class since I have only one class in my dataset. When I check the classes list during the creation of the tfrecord files. I have the following structure:

image1: 10 masks, i.e. 10 bounding boxes, classes: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

My classes list contains an entry for each bounding box and still I face the same problem. Should the entries be of different value. Meaning should we assign a different class to each bounding box even thought in our dataset we only have one true class? The documentation here regarding google/tensorfow/models leaves a lot to be desired.

kirk86 on 27 Mar 2018

@nhorro
I tried to comment out the following two lines in file create_pet_tf_record.py but I still have same issue !! would u plz clarify your solution.

'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),

@amanmeetgarg
I tried to do modifications in file create_pet_tf_record.py then regenerate TFRecord files, but I failed, still same issue during training process.
Could you please tell me how exactly modify it ??

Abduoit on 28 May 2018

I had this problem, I solved as follow:

The name of the TFRecords files should be pet_train/val.record. I changed it by editing the faces_only from True to False

check the line here
https://github.com/tensorflow/models/blob/master/research/object_detection/dataset_tools/create_pet_tf_record.py#L49

Then, I regenerated TFRecord files by this

python object_detection/dataset_tools/create_pet_tf_record.py
 --label_map_path=object_detection/data/two_label_map.pbtxt 
--data_dir=`pwd`     --output_dir=`pwd` --include_masks=True

Then, I got two TFRecords files with names pet_train/val.record, then I used them for training process with mask_rcnn_inception_v2_coco

Hope this helps

Abduoit on 29 May 2018

Just check your data!!!
I trained the Mask RCNN with my own dataset.
I can train the faster RCNN, but failed in Mask RCNN with the same dataset.
I fixed the bug because I found the mask group-truth can't map the image.
I dropped the image and everything is ok now.

guanghuixu on 10 Jul 2018

Hi @guanghuixu, could you please describe your actions in detail?

kulsemig on 3 Sep 2018

nvalidArgumentError (see above for traceback): assertion failed: [predictions must be in [0, 1]] [Condition x <= y did not hold element-wise:x，have no idea

datianshi21 on 21 Nov 2018

@nhorro thanks for the info I get the same exact error. In my case for each image I have multiple bounding boxes but they all belong to the same class since I have only one class in my dataset. When I check the classes list during the creation of the tfrecord files. I have the following structure:
image1: 10 masks, i.e. 10 bounding boxes, classes: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
My classes list contains an entry for each bounding box and still I face the same problem. Should the entries be of different value. Meaning should we assign a different class to each bounding box even thought in our dataset we only have one true class? The documentation here regarding google/tensorfow/models leaves a lot to be desired.

@kirk86 my dataset is same as yours and getting this error. Did you solve? if yes, how?