Models: InvalidArgumentError (see above for traceback): Incompatible shapes: [2,1917] vs. [3,1]

Created on 2 Oct 2018 · 34Comments · Source: tensorflow/models

I'm trying to develop a custom object detection model using online tutorials, and when I run the train.py script, I land up in the error _InvalidArgumentError (see above for traceback): Incompatible shapes: [2,1917] vs. [3,1]_.

Please, help in understanding the error and steps to fix it.

Source

anirudhkm

Most helpful comment

I think this is a error associated with ssd_random_crop.
In my case, After removing ssd_random_crop option from pipeline.config, the object_detecion_api runs without error.
ssd_random_crop is very important feature to me, so i'm using previous version (ver. June 25, 2018).

rky0930 on 18 Oct 2018

👍11 🎉5

All 34 comments

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

tensorflowbutler on 2 Oct 2018

Sorry for that, here are the answers to the above question

I am running the script from models/research/object-detection dir. I run the file train.py. Actually, the train.py file was in the directory _legacy_, so I copied it into models/research/object-detection.
Not any custom code, I have just followed a tutorial and tried to replicate the same. The tutorial that I follow is: https://www.youtube.com/watch?v=kq2Gjv_pPe8
I run on mac OS with Python 3.6 (Anaconda).
Installed tensorflow using the pip command.
Tensorflow version 1.10.1
Bazel version: Not sure about this as the tutorial didn't say anything about this.
CUDA/cuDNN: Same as above.
GPU model and memory: I think I'm just using CPU as I'm training on only 20 images.
Exact command used: python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_pets.config

anirudhkm on 2 Oct 2018

Overview of the procedure followed:

Annotate image which will generate XML files.
Convert XML files to CSV file for both train and test data.
Using the CSV file, generate TF records.
Edit the config file, generate a .pbtxt file, once I have all these, I execute it as mentioned above.

anirudhkm on 2 Oct 2018

Currently running into a similar issue (following the same tutorial) using the legacy train.py. Have looked into a few things and havn't had much luck (reducing batch size, trying a couple different weights). Saw something about changing the bounding box coords from floats to ints but not sure if that makes any sense given that the box coords are normalised for the tfrecords.

Caused by op 'Loss/Match_10/cond/mul_2', defined at:
  File "train.py", line 184, in <module>
    tf.app.run()
  File "/home/*****/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "/home/*****/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 250, in new_func
    return func(*args, **kwargs)
  File "train.py", line 180, in main
    graph_hook_fn=graph_rewriter_fn)
  File "/mnt/sdb/Tensorflow_OD_API/models/research/object_detection/legacy/trainer.py", line 290, in train
    clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
  File "/mnt/sdb/Tensorflow_OD_API/models/research/deployment/model_deploy.py", line 193, in create_clones
    outputs = model_fn(*args, **kwargs)
  File "/mnt/sdb/Tensorflow_OD_API/models/research/object_detection/legacy/trainer.py", line 205, in _create_losses
    losses_dict = detection_model.loss(prediction_dict, true_image_shapes)
  File "/mnt/sdb/Tensorflow_OD_API/models/research/object_detection/meta_architectures/ssd_meta_arch.py", line 680, in loss
    keypoints, weights)
  File "/mnt/sdb/Tensorflow_OD_API/models/research/object_detection/meta_architectures/ssd_meta_arch.py", line 853, in _assign_targets
    groundtruth_weights_list)
  File "/mnt/sdb/Tensorflow_OD_API/models/research/object_detection/core/target_assigner.py", line 483, in batch_assign_targets
    anchors, gt_boxes, gt_class_targets, unmatched_class_label, gt_weights)
  File "/mnt/sdb/Tensorflow_OD_API/models/research/object_detection/core/target_assigner.py", line 182, in assign
    valid_rows=tf.greater(groundtruth_weights, 0))
  File "/mnt/sdb/Tensorflow_OD_API/models/research/object_detection/core/matcher.py", line 241, in match
    return Match(self._match(similarity_matrix, valid_rows),
  File "/mnt/sdb/Tensorflow_OD_API/models/research/object_detection/matchers/argmax_matcher.py", line 194, in _match
    _match_when_rows_are_non_empty, _match_when_rows_are_empty)
  File "/home/*****/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
    return func(*args, **kwargs)
  File "/home/*****/.local/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2063, in cond
    orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
  File "/home/*****/.local/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1913, in BuildCondBranch
    original_result = fn()
  File "/mnt/sdb/Tensorflow_OD_API/models/research/object_detection/matchers/argmax_matcher.py", line 175, in _match_when_rows_are_non_empty
    tf.cast(tf.expand_dims(valid_rows, axis=-1), dtype=tf.float32))
  File "/home/****/.local/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 979, in binary_op_wrapper
    return func(x, y, name=name)
  File "/home/*****/.local/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 1211, in _mul_dispatch
    return gen_math_ops.mul(x, y, name=name)
  File "/home/*****/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4759, in mul
    "Mul", x=x, y=y, name=name)
  File "/home/*****/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/*****/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/home/*****/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disFable=protected-access

InvalidArgumentError (see above for traceback): Incompatible shapes: [2,1917] vs. [6,1]
     [[Node: Loss/Match_10/cond/mul_2 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/Match_10/cond/one_hot, Loss/Match_10/cond/Cast_2)]]

burrbank on 4 Oct 2018

@burrbank does your model have 6 classification labels?

anirudhkm on 5 Oct 2018

@anirudhkm I have 11 labels in my model actually. I was able to get a successful training session when I used faster_rcnn_inception_v2_coco but was still getting this error from the ssd models.

burrbank on 6 Oct 2018

I solved with this way.

huseynlilkin on 9 Oct 2018

👍2

I still met this issue when using model_main.py

cant solved

Shadowkm on 13 Oct 2018

even i am facing the same issue

venkateshbabusekar on 14 Oct 2018

if need to solve it urgently, you can down version the tensorflow object-detection version
I am training by the previous version of object-detection of tensorflow from other github

Shadowkm on 14 Oct 2018

I am getting error even after using model_main.py..

It runs for some steps then give me this error

[[{{node cond_3/Detections_Left_Groundtruth_Right/3/_1699}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_220_cond_3/Detections_Left_Groundtruth_Right/3", tensor_type=DT_STRING, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

faheemirfan on 14 Oct 2018

rky0930 on 18 Oct 2018

👍11 🎉5

I think this is a error associated with ssd_random_crop.
In my case, After removing ssd_random_crop option from pipeline.config, the object_detecion_api runs without error.
ssd_random_crop is very important feature to me, so i'm using previous version (ver. June 25, 2018).

Perfect @rky0930 ! that solved the problem. How did you find it?

And are you running the training in gpu? If yes, how long is each step taking? I am using a Tesla P100 and each step is taking 2.2 seconds

engineer1982 on 19 Oct 2018

Hi @engineer1982
Fortunately, i remove the data_augmentation_options during copy and paste.
And found out it ran without error.

GPUS: GTX1080(8Gb) x 2
model: ssd_mobilenet_v2_coco
batch_size: 64
the number of clones: 2
1 step takes about 0.9~1 sec

# tail -f train.out
INFO:tensorflow:global step 25163: loss = 4.3094 (1.006 sec/step)
INFO:tensorflow:global step 25163: loss = 4.3094 (1.006 sec/step)
INFO:tensorflow:global step 25164: loss = 4.4951 (0.985 sec/step)
INFO:tensorflow:global step 25164: loss = 4.4951 (0.985 sec/step)
INFO:tensorflow:global step 25165: loss = 4.6927 (0.893 sec/step)
INFO:tensorflow:global step 25165: loss = 4.6927 (0.893 sec/step)
INFO:tensorflow:global step 25166: loss = 4.1873 (0.924 sec/step)
INFO:tensorflow:global step 25166: loss = 4.1873 (0.924 sec/step)
INFO:tensorflow:global step 25167: loss = 4.3566 (0.879 sec/step)
INFO:tensorflow:global step 25167: loss = 4.3566 (0.879 sec/step)
INFO:tensorflow:global step 25168: loss = 4.1848 (0.894 sec/step

rky0930 on 19 Oct 2018

👍7 🎉3

I think this is a error associated with ssd_random_crop.
In my case, After removing ssd_random_crop option from pipeline.config, the object_detecion_api runs without error.
ssd_random_crop is very important feature to me, so i'm using previous version (ver. June 25, 2018).

@rky0930 : yea it worked for me as well after disabling data_augmentation_options function for ssd_random_crop.
thanks!!!

adnanobot on 23 Oct 2018

@adnanobot how can i remove data augmentation option? Can i disable it by removing that block of code?

GeneratorEX on 23 Oct 2018

i just simply commented the block like this (in the .config file, in my case it was ssd_mobilenet_v1_pets.config):

#data_augmentation_options {
#ssd_random_crop {
#}
#}

adnanobot on 23 Oct 2018

👍7 🎉3

Thank you. Finally, the error seem to disappear.

Did you manage to export the model? Im running export_inference_graph.py but it shows "Incomplete shape"

GeneratorEX on 23 Oct 2018

Yes, I did not face any problems afterwards.

adnanobot on 23 Oct 2018

@adnanobot did you encounter this error while running 'export_inference_graphy.py?

126 ops no flops stats due to incomplete shapes.
Parsing Inputs . . .
Incomplete shape.

GeneratorEX on 24 Oct 2018

@adnanobot did you encounter this error while running 'export_inference_graphy.py?

126 ops no flops stats due to incomplete shapes.
Parsing Inputs . . .
Incomplete shape.

In my case, I get those warnings but the train.py still exports the model and I am able to use it later.

engineer1982 on 24 Oct 2018

👍1

i just simply commented the block like this (in the .config file, in my case it was ssd_mobilenet_v1_pets.config):

data_augmentation_options {

ssd_random_crop {

}

}

Hello there! I spent a few days to make ssd_mobilenet_v1_coco work and finally I could! It works!!!

VladislavMz on 25 Oct 2018

@engineer1982 did you encounter accuracy issues? My model didn't seem to work properly. It mistakenly detect all object it sees

GeneratorEX on 30 Oct 2018

@GeneratorEX , no.. the accuracy was fine

engineer1982 on 31 Oct 2018

@engineer1982 My current training checkpoint is at 40k steps and the loss is not decreasing. Loss is stucked at 2 and sometimes decrease to 1. How can I fix this?

GeneratorEX on 1 Nov 2018

i just simply commented the block like this (in the .config file, in my case it was ssd_mobilenet_v1_pets.config):

data_augmentation_options {

ssd_random_crop {

}

}

@adnanobot Thank you for your advice to solve my problem.

wangyongqi on 1 Nov 2018

@engineer1982 My current training checkpoint is at 40k steps and the loss is not decreasing. Loss is stucked at 2 and sometimes decrease to 1. How can I fix this?

Its very hard to tell how can you fix it.. It can be an error in generating the xml files or the model is not suitable for the task.. or other thing :(

If you are usin mobilenet, try to switch it to fast rcnn inception resnet v2. It is very accurate, but a little more slow to train and predict.

engineer1982 on 1 Nov 2018

hi,@rky0930 .It works.Thanks very much!!!

Fortunately, i remove the data_augmentation_options during copy and paste.
And found out it ran without error.

GPUS: GTX1080(8Gb) x 2
model: ssd_mobilenet_v2_coco
batch_size: 64
the number of clones: 2
1 step takes about 0.9~1 sec

# tail -f train.out
INFO:tensorflow:global step 25163: loss = 4.3094 (1.006 sec/step)
INFO:tensorflow:global step 25163: loss = 4.3094 (1.006 sec/step)
INFO:tensorflow:global step 25164: loss = 4.4951 (0.985 sec/step)
INFO:tensorflow:global step 25164: loss = 4.4951 (0.985 sec/step)
INFO:tensorflow:global step 25165: loss = 4.6927 (0.893 sec/step)
INFO:tensorflow:global step 25165: loss = 4.6927 (0.893 sec/step)
INFO:tensorflow:global step 25166: loss = 4.1873 (0.924 sec/step)
INFO:tensorflow:global step 25166: loss = 4.1873 (0.924 sec/step)
INFO:tensorflow:global step 25167: loss = 4.3566 (0.879 sec/step)
INFO:tensorflow:global step 25167: loss = 4.3566 (0.879 sec/step)
INFO:tensorflow:global step 25168: loss = 4.1848 (0.894 sec/step

gmt710 on 3 Nov 2018

I think this is a error associated with ssd_random_crop.
In my case, After removing ssd_random_crop option from pipeline.config, the object_detecion_api runs without error.
ssd_random_crop is very important feature to me, so i'm using previous version (ver. June 25, 2018).

Hi,
In https://github.com/tensorflow/models/releases
I didnt find 25th of June version, I only found 1 of June. Is that right version you pointed to?

SteveIb on 5 Nov 2018

remove the data augmentation lines from the config file and it will run without error.

data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}

kumarvaibhav2003 on 30 Nov 2018

Hey,

Exact command to reproduce: python train.py --train_data_pattern='audioset_v1_embeddings/bal_train/*.tfrecord' --model=LogisticModel --frame_features=True --train_dir=tmp/yt8m_model/--feature_names="audio_embedding" --feature_sizes="128" --batch_size=128

Source code / logs

Below is the Output from the command.

InvalidArgumentError: Incompatible shapes: [128,527] vs. [128,300,527]
[[node tower/loss_xent/mul_1 (defined at /Users/ayoubazzouzi/Downloads/fox-audio-master/youtube8m/losses.py:50) = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](tower/loss_xent/sub, tower/loss_xent/Log_1)]]

Any help, please

ayoubazz on 21 Dec 2018

I delete these lines in ssdlite_mobilenet_v2_coco.config and pipeline.config

data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}

petchthanut on 25 Jan 2019

Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

tensorflowbutler on 30 Jan 2020

Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!