Models: WeightedIOULocalizationLoss: incompatible shapes

Created on 22 Apr 2018 · 9Comments · Source: tensorflow/models

System information

What is the top-level directory of the model you are using:object_detection
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 1.5

When using weighted_iou as localization loss:

localization_loss {                                                                                                                                                                              
         weighted_iou {                                                                                                                                                                                          
         }
}

I get the following error.

File "/research/object_detection/meta_architectures/ssd_meta_arch.py", line 513, in loss
location_losses, cls_losses, prediction_dict, match_list)
File "/research/object_detection/meta_architectures/ssd_meta_arch.py", line 683, in _apply_hard_mining
match_list=match_list)
File "/research/object_detection/core/losses.py", line 487, in __call__
'do not have compatible shapes.', len(location_losses), len(decoded_boxlist_list), len(cls_losses))
ValueError: ('location_losses, cls_losses and decoded_boxlist_list do not have compatible shapes.', 46008, 24, 24)

I had adapted an SSD_mobilenet config.

In core/losses.py:
return tf.reshape(weights, [-1]) * per_anchor_iou_loss
This seems to cause the issue because it does not take the batch size into account.

Source

kilsenp

Most helpful comment

To overcome this problem, I changed the code inside object_detection/core/losses.py under the IOU location loss class, to something like this:

  def _compute_loss(self, prediction_tensor, target_tensor, weights):
    """Compute loss function.

    Args:
      prediction_tensor: A float tensor of shape [batch_size, num_anchors, 4]
        representing the decoded predicted boxes
      target_tensor: A float tensor of shape [batch_size, num_anchors, 4]
        representing the decoded target boxes
      weights: a float tensor of shape [batch_size, num_anchors]

    Returns:
      loss: a float tensor of shape [batch_size, num_anchors] tensor
        representing the value of the loss function.
    """
    batch_size = prediction_tensor.get_shape().as_list()[0]
    predicted_boxes = box_list.BoxList(tf.reshape(prediction_tensor, [-1, 4]))
    target_boxes = box_list.BoxList(tf.reshape(target_tensor, [-1, 4]))
    per_anchor_iou_loss = 1.0 - box_list_ops.matched_iou(predicted_boxes,
                                                         target_boxes)

    return tf.reshape(weights, [batch_size, num_anchors, -1]) * tf.reshape(per_anchor_iou_loss, [batch_size, num_anchors, -1])

But after doing that, you need to change the matched_iou function under object_detection/core/box_list_ops.py, to something like this:

def matched_iou(boxlist1, boxlist2, scope=None):
  """Compute intersection-over-union between corresponding boxes in boxlists.

  Args:
    boxlist1: BoxList holding N boxes
    boxlist2: BoxList holding N boxes
    scope: name scope.

  Returns:
    a tensor with shape [N] representing pairwise iou scores.
  """
  with tf.name_scope(scope, 'MatchedIOU'):
    intersections = matched_intersection(boxlist1, boxlist2)
    areas1 = area(boxlist1)
    areas2 = area(boxlist2)
    unions = areas1 + areas2 - intersections
    epsilon = 1e-10
    return tf.where(
        tf.equal(intersections, 0.0),
        tf.zeros_like(intersections), tf.truediv(intersections, unions + epsilon))

Otherwise you would get NaN..

lernerbruno on 2 Sep 2019

❤1 👍1

All 9 comments

I tried it out myself by reshaping it to [batch_size, -1], which does not throw the error anymore, however, I do get a NaN error in the Loss Tensor after around 200 steps.
The current code works when using focal loss instead of the hard miner, but with this configuration I also get a NaN error.

InvalidArgumentError (see above for traceback): LossTensor is inf or nan. : Tensor had NaN values
         [[Node: CheckNumerics = CheckNumerics[T=DT_FLOAT, message="LossTensor is inf or nan.", _device="/job:localhost/replica:0/task:0/device:CPU:0"](total_loss)]]

kilsenp on 23 Apr 2018

This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there.

If you think we've misinterpreted a bug, please comment again with a clear explanation, as well as all of the information requested in the issue template. Thanks!

robieta on 23 Apr 2018

@kilsenp
I got the same problem.
Have you solved it?

liu09114 on 20 Aug 2018

@liu09114 No, I did not follow up on this.

kilsenp on 23 Aug 2018

❤2

To overcome this problem, I changed the code inside object_detection/core/losses.py under the IOU location loss class, to something like this:

  def _compute_loss(self, prediction_tensor, target_tensor, weights):
    """Compute loss function.

    Args:
      prediction_tensor: A float tensor of shape [batch_size, num_anchors, 4]
        representing the decoded predicted boxes
      target_tensor: A float tensor of shape [batch_size, num_anchors, 4]
        representing the decoded target boxes
      weights: a float tensor of shape [batch_size, num_anchors]

    Returns:
      loss: a float tensor of shape [batch_size, num_anchors] tensor
        representing the value of the loss function.
    """
    batch_size = prediction_tensor.get_shape().as_list()[0]
    predicted_boxes = box_list.BoxList(tf.reshape(prediction_tensor, [-1, 4]))
    target_boxes = box_list.BoxList(tf.reshape(target_tensor, [-1, 4]))
    per_anchor_iou_loss = 1.0 - box_list_ops.matched_iou(predicted_boxes,
                                                         target_boxes)

    return tf.reshape(weights, [batch_size, num_anchors, -1]) * tf.reshape(per_anchor_iou_loss, [batch_size, num_anchors, -1])

But after doing that, you need to change the matched_iou function under object_detection/core/box_list_ops.py, to something like this:

def matched_iou(boxlist1, boxlist2, scope=None):
  """Compute intersection-over-union between corresponding boxes in boxlists.

  Args:
    boxlist1: BoxList holding N boxes
    boxlist2: BoxList holding N boxes
    scope: name scope.

  Returns:
    a tensor with shape [N] representing pairwise iou scores.
  """
  with tf.name_scope(scope, 'MatchedIOU'):
    intersections = matched_intersection(boxlist1, boxlist2)
    areas1 = area(boxlist1)
    areas2 = area(boxlist2)
    unions = areas1 + areas2 - intersections
    epsilon = 1e-10
    return tf.where(
        tf.equal(intersections, 0.0),
        tf.zeros_like(intersections), tf.truediv(intersections, unions + epsilon))

Otherwise you would get NaN..

lernerbruno on 2 Sep 2019

❤1 👍1

Thank you for the suggestion @lernerbruno. I am getting a compile time error of "Unresolved reference num_anchors". Can you throw some light on how we will get num_anchors? Should i just take it from the prediction_tensor or target_tensor?

tispratik on 20 Mar 2020

Add num_anchors = prediction_tensor.get_shape().as_list()[1] to the _compute_loss function?

tispratik on 4 Apr 2020

Were you able to get this to work @tispratik ? I did as you suggested above with defining num_anchors but I then get ValueError: Shape must be rank 1 but is rank 2 for 'Loss/non_max_suppression/NonMaxSuppressionV3' (op: 'NonMaxSuppressionV3') with input shapes: [1917,4], [1917,1], [], [], []. when I try to train.