Darkflow: yolov2 loss inconsistent with paper

Created on 26 Mar 2017 · 22Comments · Source: thtrieu/darkflow

In original Yolo 9000 paper, the network predicts P(Object) * IOU(Object, boundingbox). When I scaned through your implementation in net/yolov2/train.py, I found the losses was defined by the binary P(Object) only. Is this consistent with the original darknet? (I haven't read the source code of original darknet).

bug

Source

zhisong

Most helpful comment

I read the C source code of darknet. The author did implement the loss function of objectness score in the same way as he stated in the paper, so I think this inconsistency in tensorflow implementation could be a problem.

ldf921 on 29 Mar 2017

👍3

All 22 comments

ldf921 on 29 Mar 2017

👍3

@ryansun1900

thtrieu on 2 Apr 2017

I implemented YoloV2 training by referring darkflow yolo V1 & Yolo 9000 paper.
It's possible not consistent with original darknet C source code.
I appreciate you for pointing out this problem. But recently I am too busy.
It's great if anyone could help translate the training from C source code directly and pull a new request!

ryansun1900 on 3 Apr 2017

Looking at darknet, assuming the loss code(edit yolo v1) is https://github.com/pjreddie/darknet/blob/master/src/detection_layer.c#L66 . The code in https://github.com/thtrieu/darkflow/blob/master/net/yolov2/train.py looks like a pretty good vectorization of it. I did notice two things:

There is no float rmse = box_rmse(out, truth)(line 121) in darkflow train. It is used when best_iou is 0.
Assuming I'm understading it correctly, the loss in darknet is (abbreviated)

l.delta[class_index+j] = l.class_scale * (net.truth[truth_index+1+j] - l.output[class_index+j]);
l.delta[p_index] = l.object_scale * (1.-l.output[p_index]);
l.delta[box_index+0] = l.coord_scale*(net.truth[tbox_index + 0] - l.output[box_index + 0]); 
*(l.cost) = pow(mag_array(l.delta, l.outputs * l.batch), 2);

but loss in darkflow is

loss = tf.pow(adjusted_net_out - true, 2)
loss = tf.multiply(loss, wght)
loss = tf.reshape(loss, [-1, H*W*B*(4 + 1 + C)])
loss = tf.reduce_sum(loss, 1)
self.loss = .5 * tf.reduce_mean(loss)

which is close, but slightly out of order.

jcarletgo on 22 Apr 2017

@jcarletgo there is an addition line
if(l.rescore){ l.delta[p_index] = l.object_scale * (iou - l.output[p_index]); }
which states that iou is used in the loss layer.

zhisong on 24 Apr 2017

👍1

Looking again, the loss code for yolo v2 is https://github.com/pjreddie/darknet/blob/master/src/region_layer.c#L174
Where https://github.com/pjreddie/darknet/blob/master/src/detection_layer.c#L66 is the loss for yolo v1

jcarletgo on 25 Apr 2017

Do we have any update on this matter?

EmmanouelP on 29 May 2017

No, nothing.
Can I ask you a question ?
What's kind of program language do you usually use?

2017-05-29 19:18 GMT+08:00 EmmanouelP notifications@github.com:

Do we have any update on this matter?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/thtrieu/darkflow/issues/104#issuecomment-304637209,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AWmbwvrtI82hn1iNi2vdE6VWjGyVRmhAks5r-qlvgaJpZM4Mpeex
.

87216LiaoXin on 29 May 2017

I usually use python & R. Now I am still busy for some personal projects and don't have time to check the C source code. Hope someone could help to check. Thanks.

ryansun1900 on 29 May 2017

R language for big data ?
I still usually use c.
Then, my python is still in a low level.
When I look your program, I feel so frustrated.
And I'm sorry, because I can't help you.
But you give me a goal.

2017-05-29 23:20 GMT+08:00 ryansun1900 notifications@github.com:

I usually use python & R.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/thtrieu/darkflow/issues/104#issuecomment-304685536,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AWmbwpalKayApYOZW-XlvDkYK_uw9E7wks5r-uIpgaJpZM4Mpeex
.

87216LiaoXin on 29 May 2017

thank you

2017-05-29 23:57 GMT+08:00 廖心瑜 gs45ewe54ger@gmail.com:

R language for big data ?
I still usually use c.
Then, my python is still in a low level.
When I look your program, I feel so frustrated.
And I'm sorry, because I can't help you.
But you give me a goal.

2017-05-29 23:20 GMT+08:00 ryansun1900 notifications@github.com:

I usually use python & R.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/thtrieu/darkflow/issues/104#issuecomment-304685536,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AWmbwpalKayApYOZW-XlvDkYK_uw9E7wks5r-uIpgaJpZM4Mpeex
.

87216LiaoXin on 29 May 2017

I just have a question regarding this bug: Will it affect inference? If I finetune/train using the darknet C code to get the weights and use this repository for inference will it work?

minhnhat93 on 11 Jul 2017

👍2

@ryansun1900, @minhnhat93 raises a good point. If the training script won't work in this repo because of implementation differences, then the next best thing is to train on darkflow and simply use the weights here.

zacharynevin on 4 Aug 2017

@minhnhat93 My instinct is that it should work fine. A bug in the training function should not affect the predictions for pretrained models. In fact, I tried darkflow on the pretrained darknet models and they appeared to work just fine.

zacharynevin on 4 Aug 2017

@minhnhat93 but the problem is that when you compare the person detection with original yolo, this darkflow is not good that it fails to detect some people in an image. I did not dig too much into the code, not sure why this inconsistency happens since we are using the pretrained model from original yolo, right ?

ouceduxzk on 6 Dec 2017

@ryansun1900 @thtrieu was this fixed?

Ridhwanluthra on 31 Dec 2017

@ryansun1900 @thtrieu was this bug fixed?

wjd92 on 10 Jan 2018

@jcarletgo @ryansun1900 @thtrieu nice thread to follow. There's another "inconsistency" compare with paper. In YOLOv1 paper YOLO should output feature map of shape (S^2, B * 5+C). But implementation here is using shape of [S^2, B*(5+C)]... If I understand correctly, for regression on class probability, the way the paper mentioned is to penalize classification error for all anchor boxes as long as that grid cell is matching with true box, but the implementation here is to penalize only the "best_match" anchor box and ignore the non-matching ones. I didn't look at darknet c code to compare. Is this done on purpose?

tianyu-tristan on 6 Apr 2018

👍2

@tianyu-tristan I was wondering the same thing and checked out darknet code. It seems like B*(5+C) is the right way to go, looking at https://github.com/pjreddie/darknet/blob/508381b37fe75e0e1a01bcb2941cb0b31eb0e4c9/src/region_layer.c#L22

wns349 on 16 Apr 2018

👍1

@jcarletgo Could you elaborate on what you mean with "slightly out of order" ? I'm trying hard to match the darknet implementation against the darkflow implementation of calculating the loss but I just start slowly recognising the components of the formula (https://stats.stackexchange.com/questions/287486/yolo-loss-function-explanation) in each implementation... and do not yet see any differences.