I wrote my own loss function and during training, the loss turned into negative, which is impossible since there's only squared values in my loss function. So I think there might be an overflow, but theano's NanGuardMode doesn't complain about NaNs or Infs. The rest of my model doesn't cause the negative loss since it works well with another loss function. Here is my loss function:
def overlap(x1,w1,x2,w2):
"""
Args:
x1: x for the first box
w2: width for the first box
x2 : x for the second box
w2: width for the second box
"""
l1 = x1 - w1*w1/2
l2 = x2 - w2*w2/2
left = T.switch(T.lt(l1,l2),l2,l1)
r1 = x1 + w1*w1/2
r2 = x2 + w2*w2/2
right = T.switch(T.lt(r1,r2),r1,r2)
return right - left
def box_intersection(a,b):
"""
Args:
a: the first box, a n*4 tensor
b: the second box,another n*4 tensor
Returns:
area: n*1 tensor, indicating the intersection area of each two boxes
"""
w = overlap(a[:,:,0],a[:,:,3],b[:,:,0],b[:,:,3])
h = overlap(a[:,:,1],a[:,:,2],b[:,:,1],b[:,:,2])
w = T.switch(T.lt(w,0),0,w)
h = T.switch(T.lt(h,0),0,h)
area = w*h
return area
def box_union(a,b):
i = box_intersection(a,b)
area_a = a[:,:,2]*a[:,:,3]
area_b = b[:,:,2]*b[:,:,3]
u = area_a*area_a + area_b*area_b - i
return u
def box_iou(a,b):
#the net and groud truth are all the square root of height and width
u = box_union(a,b)
i = box_intersection(a,b)
u = T.switch(T.le(u,0),10000,u)
return i/u
def custom_loss_2(y_pred,y_true):
loss = 0.0
y_pred = y_pred.reshape((y_pred.shape[0],49,30))
y_true = y_true.reshape((y_true.shape[0],49,30))
a = y_pred[:,:,0:4]
b = y_pred[:,:,5:9]
gt = y_true[:,:,0:4]
#iou bewteen box a and gt
iou_a_gt = box_iou(a,gt)
#iou bewteen box b and gt
iou_b_gt = box_iou(b,gt)
#mask is either 0 or 1, 1 indicates box b has a higher iou with gt than box a
mask = T.switch(T.lt(iou_a_gt,iou_b_gt),1,0)
#loss bewteen box a and gt
loss_a_gt = T.sum(T.square(a - gt),axis=2) * 5
#loss bewteen box b and gt
loss_b_gt = T.sum(T.square(b - gt),axis=2) * 5
loss = loss + y_true[:,:,4] * (1 - mask) * loss_a_gt
loss = loss + y_true[:,:,4] * mask * loss_b_gt
#confident loss bewteen a and gt
closs_a_gt = T.square(y_pred[:,:,4] - y_true[:,:,4])
#confident loss bewteen b and gt
closs_b_gt = T.square(y_pred[:,:,9] - y_true[:,:,4])
loss = loss + closs_a_gt * (1-mask) * y_true[:,:,4]
loss = loss + closs_b_gt * mask * y_true[:,:,4]
#if the cell has no obj
loss = loss + closs_a_gt * (1-y_true[:,:,4]) * 0.5
loss = loss + closs_b_gt * (1-y_true[:,:,4]) * 0.5
#add loss for the conditioned classification error
loss = loss + T.sum(T.square(y_pred[:,:,10:30] - y_true[:,:,10:30]),axis=2)* y_true[:,:,4]
loss = T.sum(loss,axis=1)
loss = T.mean(loss)
return loss
y_pred and y_true are batch_size4930 tensors, values in y_true are either 1 or 0.
loss = loss + T.sum(T.square(y_pred[:,:,10:30] - y_true[:,:,10:30]),axis=2)* y_true[:,:,4]
if y_true has negative value, the whole thing will be negative since it is outside the square function.
@EderSantana Thanks for replying, but I'm pretty sure y_true is either 1 or 0, I've already double checked :). It's more like an overflow, since when I set learning_rate=1e-4, the loss is negative at the beginning of training, but if I set learning_rate=1e-8, the loss is positive at the beginning of training, but decrease to negative during training.
I found the problem. y_true has to be the first parameter and y_pred the second, I accidentally pass them to my loss function in the wrong order, such a stupid mistake.
Is this a custom bounding box loss that works? Will this work with TF backend?
Most helpful comment
I found the problem. y_true has to be the first parameter and y_pred the second, I accidentally pass them to my loss function in the wrong order, such a stupid mistake.