Darknet: Help to understand what happened in the detection layer

Created on 10 Nov 2017 · 4Comments · Source: AlexeyAB/darknet

Hello guys, I followed this project long time ago and successfully trained my model with my own dataset. Right now, I am trying to implement YOLO in an Snapdragon Neural Processing Engine(SNPE). Basically, that SNPE doesnt support the "detection" layer so I have to implement this one by myself. I read the YOLOv2 paper and understand the concept of how the "detection" layer works but when I tried to dig deep into the code at the yolo src folder, I can not find where did the trained model predict the bounding box from and anchor boxed and their offset as well as the confidence score for each detected object.

Take a look at this example, at the layer 14 the output is 20x20x30. So my question is exactly what did they do with this 20x20x30 tensor to make prediction. Please help me to clarify this. Thanks !

Source

phongnhhn92

Most helpful comment

@phongnhhn92 Hi,

At first you should look at this forward() function in the OpenCV: https://github.com/AlexeyAB/opencv/blob/ecc34dc5219bf70cf9ede89cf7bac8f895938da1/modules/dnn/src/layers/region_layer.cpp#L117

Region_layer consists of:

flatten - in OpenCV version it is implemented as separate Permute layer: https://github.com/AlexeyAB/darknet/blob/master/src/region_layer.c#L150
region_layer - here is the following:
- Darknet version: https://github.com/AlexeyAB/darknet/blob/master/src/region_layer.c#L144
- OpenCV version: https://github.com/AlexeyAB/opencv/blob/ecc34dc5219bf70cf9ede89cf7bac8f895938da1/modules/dnn/src/layers/region_layer.cpp#L117

For example, if:

network size = 96x96
classes = 3
anchros = 5

Then output tensor for yolo-voc.cfg = 3 x 3 x 40

Where is each cell of these 40 cells calculated as:

region_layer
grid_cell

And then you should look at the example of using a Yolo v2, how to get the coordinates of an object: https://github.com/AlexeyAB/opencv/blob/ecc34dc5219bf70cf9ede89cf7bac8f895938da1/samples/dnn/yolo_object_detection.cpp#L77

AlexeyAB on 10 Nov 2017

👍2 🎉1

All 4 comments

@phongnhhn92 Hi,

At first you should look at this forward() function in the OpenCV: https://github.com/AlexeyAB/opencv/blob/ecc34dc5219bf70cf9ede89cf7bac8f895938da1/modules/dnn/src/layers/region_layer.cpp#L117

Region_layer consists of:

flatten - in OpenCV version it is implemented as separate Permute layer: https://github.com/AlexeyAB/darknet/blob/master/src/region_layer.c#L150
region_layer - here is the following:
- Darknet version: https://github.com/AlexeyAB/darknet/blob/master/src/region_layer.c#L144
- OpenCV version: https://github.com/AlexeyAB/opencv/blob/ecc34dc5219bf70cf9ede89cf7bac8f895938da1/modules/dnn/src/layers/region_layer.cpp#L117

For example, if:

network size = 96x96
classes = 3
anchros = 5

Then output tensor for yolo-voc.cfg = 3 x 3 x 40

Where is each cell of these 40 cells calculated as:

region_layer
grid_cell

And then you should look at the example of using a Yolo v2, how to get the coordinates of an object: https://github.com/AlexeyAB/opencv/blob/ecc34dc5219bf70cf9ede89cf7bac8f895938da1/samples/dnn/yolo_object_detection.cpp#L77

AlexeyAB on 10 Nov 2017

👍2 🎉1

Thanks for your reply !

So using the latest version of OpenCV, I run the yolo_object_detecion.cpp as you suggested and I see how OpenCV return detected bounding box. In my case, I am using tiny-yolo trained 1 class, the input is 416x416 and the ouput is 13x13x30. Using the code above, the return detectionMat has 6 cols and 845 rows and equals to 13x13x30.

My question is that in the loop that we find the row which has the prob_obj higher then the threshold (0.5 for example) then what is "in" in the "logistic_activate(in)" and what is the relationship with logistic activation ?

yolo

phongnhhn92 on 21 Nov 2017

@phongnhhn92 Hi,
I updated image from my previous post. Yes, I made typo, correctly SINGLE CELL SIZE = 40.

So if your input is 416x416 and the ouput is 13x13x40, then output in the OpenCV-Darknet will be 845x8 (where 845=13x13x5= output_width X output_height X anchors)

Then we will iterate each row, and will find maximum probability for each ANCHOR in each CELL, where final_probability_obj_X = t0 * prob_obj_X. And if this final_probability_obj_MAX > threshold then there is object.

detection_output

If threshold < MAX (t0*prob_obj_1, t0*prob_obj_2, t0*prob_obj_3) then there is object.

AlexeyAB on 21 Nov 2017

👍1

HI~ @AlexeyAB
I need to modify region layer, too
On get_region_detections function , I do not find to execute the logistic_activation at original darknet(pjreddie/darknet).
please, could you tell me, where is execute this ?