When using weighted_iou as localization loss:
localization_loss {
weighted_iou {
}
}
I get the following error.
File "/research/object_detection/meta_architectures/ssd_meta_arch.py", line 513, in loss
location_losses, cls_losses, prediction_dict, match_list)
File "/research/object_detection/meta_architectures/ssd_meta_arch.py", line 683, in _apply_hard_mining
match_list=match_list)
File "/research/object_detection/core/losses.py", line 487, in __call__
'do not have compatible shapes.', len(location_losses), len(decoded_boxlist_list), len(cls_losses))
ValueError: ('location_losses, cls_losses and decoded_boxlist_list do not have compatible shapes.', 46008, 24, 24)
I had adapted an SSD_mobilenet config.
In core/losses.py:
return tf.reshape(weights, [-1]) * per_anchor_iou_loss
This seems to cause the issue because it does not take the batch size into account.
I tried it out myself by reshaping it to [batch_size, -1], which does not throw the error anymore, however, I do get a NaN error in the Loss Tensor after around 200 steps.
The current code works when using focal loss instead of the hard miner, but with this configuration I also get a NaN error.
InvalidArgumentError (see above for traceback): LossTensor is inf or nan. : Tensor had NaN values
[[Node: CheckNumerics = CheckNumerics[T=DT_FLOAT, message="LossTensor is inf or nan.", _device="/job:localhost/replica:0/task:0/device:CPU:0"](total_loss)]]
This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there.
If you think we've misinterpreted a bug, please comment again with a clear explanation, as well as all of the information requested in the issue template. Thanks!
@kilsenp
I got the same problem.
Have you solved it?
@liu09114 No, I did not follow up on this.
To overcome this problem, I changed the code inside object_detection/core/losses.py under the IOU location loss class, to something like this:
def _compute_loss(self, prediction_tensor, target_tensor, weights):
"""Compute loss function.
Args:
prediction_tensor: A float tensor of shape [batch_size, num_anchors, 4]
representing the decoded predicted boxes
target_tensor: A float tensor of shape [batch_size, num_anchors, 4]
representing the decoded target boxes
weights: a float tensor of shape [batch_size, num_anchors]
Returns:
loss: a float tensor of shape [batch_size, num_anchors] tensor
representing the value of the loss function.
"""
batch_size = prediction_tensor.get_shape().as_list()[0]
predicted_boxes = box_list.BoxList(tf.reshape(prediction_tensor, [-1, 4]))
target_boxes = box_list.BoxList(tf.reshape(target_tensor, [-1, 4]))
per_anchor_iou_loss = 1.0 - box_list_ops.matched_iou(predicted_boxes,
target_boxes)
return tf.reshape(weights, [batch_size, num_anchors, -1]) * tf.reshape(per_anchor_iou_loss, [batch_size, num_anchors, -1])
But after doing that, you need to change the matched_iou function under object_detection/core/box_list_ops.py, to something like this:
def matched_iou(boxlist1, boxlist2, scope=None):
"""Compute intersection-over-union between corresponding boxes in boxlists.
Args:
boxlist1: BoxList holding N boxes
boxlist2: BoxList holding N boxes
scope: name scope.
Returns:
a tensor with shape [N] representing pairwise iou scores.
"""
with tf.name_scope(scope, 'MatchedIOU'):
intersections = matched_intersection(boxlist1, boxlist2)
areas1 = area(boxlist1)
areas2 = area(boxlist2)
unions = areas1 + areas2 - intersections
epsilon = 1e-10
return tf.where(
tf.equal(intersections, 0.0),
tf.zeros_like(intersections), tf.truediv(intersections, unions + epsilon))
Otherwise you would get NaN..
Thank you for the suggestion @lernerbruno. I am getting a compile time error of "Unresolved reference num_anchors". Can you throw some light on how we will get num_anchors? Should i just take it from the prediction_tensor or target_tensor?
Add num_anchors = prediction_tensor.get_shape().as_list()[1] to the _compute_loss function?
Were you able to get this to work @tispratik ? I did as you suggested above with defining num_anchors but I then get ValueError: Shape must be rank 1 but is rank 2 for 'Loss/non_max_suppression/NonMaxSuppressionV3' (op: 'NonMaxSuppressionV3') with input shapes: [1917,4], [1917,1], [], [], []. when I try to train.
It had worked, but i got worse results on training. So i moved back to weighted_l2.
Most helpful comment
To overcome this problem, I changed the code inside
object_detection/core/losses.pyunder the IOU location loss class, to something like this:But after doing that, you need to change the
matched_ioufunction underobject_detection/core/box_list_ops.py, to something like this:Otherwise you would get NaN..