Darknet: Cross Entropy for YOLOv3

Created on 2 Jan 2019 · 4Comments · Source: pjreddie/darknet

Hello All,

In Yolov3 paper, it is clearly stated that the loss function is the same as the previous versions of YOLO, with an exception of the last component which uses cross entropy.

I have gone through the code and there is no sign for cross entropy i.e. pi(c)*log(pi^(c))

Does anyone have a clear understanding of the cross entropy in YOLOv3?

Thanks

Source

yasserkhalil93

Most helpful comment

@yasserkhalil93 Hi,

You can read it here: https://github.com/AlexeyAB/darknet/issues/1695#issuecomment-450995001

There is Binary cross-entropy loss = −(t*ln(y) + (1−t)*ln(1−y)) - we should minimize it
Also d(loss)/d(y) = loss_derivative = y-t : https://peterroelants.github.io/posts/cross-entropy-logistic/

y - probability [0 - 1]
t - is the class correct 1 or not 0

We do it here: https://github.com/pjreddie/darknet/blob/61c9d02ec461e30d55762ec7669d6a1d3c356fb2/src/yolo_layer.c#L120
The same:

t==1: i.e. if (detected_class == thruth_class) delta = -loss_derivative = -(y-t) = 1-y
t==0: i.e. if (detected_class != thruth_class) delta = -loss_derivative = -(y-t) = -y

Free-form reasoning - in general, in the Yolo v3:

| | |
|---|---|

we use Binary cross-entropy for multi-label classification: loss = −(y*ln(p) + (1−y)*ln(1−p)) and we should minimize it
- p - probability [0 - 1]
- y - is the class correct 1 or not 0
so we should minimize cost: loss = −ln(p) if(y==1) or loss = −ln(1−p) if(y==0), those
- if(y==1) then should be -ln(p)=0, those p=1
- if(y==0) then should be -ln(1-p)=0, those p=0
then we can do it by maximizing p if(y==1) or maximizing 1-p if(y==0),
- if(y==1) then we should maximize logistic_activation(x + delta) so delta > 0
- if(y==0) then we should minimize logistic_activation(x + delta) so delta < 0
we do it here: https://github.com/pjreddie/darknet/blob/61c9d02ec461e30d55762ec7669d6a1d3c356fb2/src/yolo_layer.c#L120
The same:
if (detected_class == thruth_class) delta = 1-p > 0
if (detected_class != thruth_class) delta = −p < 0

where is p = logistic_activation(x) = output[index + stride*n]

In general, there are two types of classification

multi-label classification - each bounded box (each anchor) can have several classes. And in total there are in the neural model >= 1 classes. There is used Binary cross-entropy with Logistic activation (sigmoid). Is used in Yolo v3
multi-class classification - each bounded box (each anchor) can have only one classes. And in total there are in the neural model >= 1 classes. There is used Categorical cross-entropy with Softmax activation. Is used in Yolo v2

There is used Binary cross-entropy with Logistic activation (sigmoid) for multi-label classification in the Yolo v3, so each bonded box (each anchor) can have several classes. For example, one bounded box can be Animal, Cat or Truck, Car. Or even Cat, Dog if they are close to each other.

So:

There is used Logistic activation (sigmoid) = 1./(1. + exp(-x)) so:
- probabilities of several classes can be ~=1 - so we can detect Cat, Dog in one box (multi-label)
- nonlinearity - this is a prerequisite for use in the Neural Networks - A. N. Gorban and D. C. Wunsch, "The General Approximation Theorem," 1998: http://scholarsmine.mst.edu/cgi/viewcontent.cgi?article=2908&context=ele_comeng_facwork
For the neural networks, our result states that the function of neuron activation must be nonlinear - and nothing else. Whatever this nonlinearity is, the network of connections can be constructed, and coefficients of linear connections between the neurons can be adjusted in such a way that the neural network will compute any continuous function from its input signals with any given accuracy.
- derivative is very simple = (1-x)*x
There is used Binary Classification, binary - means that we look at each class separately, and we consider each class as 2 classes (There is or There is no). So we use this formula: https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html
loss = −(y*log(p) + (1−y)*log(1−p))
- log - the natural log ln
- y - binary indicator (0 or 1) if class label c is the correct classification for observation o
- p - predicted probability observation o is of class c

So:

if (detected_class == thruth_class) loss = −log(p)
if (detected_class != thruth_class) loss = −log(1−p)

Where is p = logistic_activation(x), this is output[index + stride*n] in the yolo_layer.c source code.
And we should minimize cost: loss = −log(p) or loss = −log(1−p).

As said in the MXNET doc: https://gluon.mxnet.io/chapter02_supervised-learning/logistic-regression-gluon.html

if (detected_class == thruth_class) we should maximize log(p), i.e. we should maximize p
if (detected_class != thruth_class) we should maximize (1−p)

https://peterroelants.github.io/posts/cross-entropy-logistic/
https://en.wikipedia.org/wiki/Cross_entropy#Cross-entropy_error_function_and_logistic_regression
https://gluon.mxnet.io/chapter02_supervised-learning/logistic-regression-gluon.html
https://ml-cheatsheet.readthedocs.io/en/latest/logistic_regression.html
https://en.wikipedia.org/wiki/Logistic_regression

AlexeyAB on 2 Jan 2019

👍10 🎉1

All 4 comments

@yasserkhalil93 Hi,

You can read it here: https://github.com/AlexeyAB/darknet/issues/1695#issuecomment-450995001

y - probability [0 - 1]
t - is the class correct 1 or not 0

We do it here: https://github.com/pjreddie/darknet/blob/61c9d02ec461e30d55762ec7669d6a1d3c356fb2/src/yolo_layer.c#L120
The same:

t==1: i.e. if (detected_class == thruth_class) delta = -loss_derivative = -(y-t) = 1-y
t==0: i.e. if (detected_class != thruth_class) delta = -loss_derivative = -(y-t) = -y

Free-form reasoning - in general, in the Yolo v3:

| | |
|---|---|

we use Binary cross-entropy for multi-label classification: loss = −(y*ln(p) + (1−y)*ln(1−p)) and we should minimize it
- p - probability [0 - 1]
- y - is the class correct 1 or not 0
so we should minimize cost: loss = −ln(p) if(y==1) or loss = −ln(1−p) if(y==0), those
- if(y==1) then should be -ln(p)=0, those p=1
- if(y==0) then should be -ln(1-p)=0, those p=0
then we can do it by maximizing p if(y==1) or maximizing 1-p if(y==0),
- if(y==1) then we should maximize logistic_activation(x + delta) so delta > 0
- if(y==0) then we should minimize logistic_activation(x + delta) so delta < 0
we do it here: https://github.com/pjreddie/darknet/blob/61c9d02ec461e30d55762ec7669d6a1d3c356fb2/src/yolo_layer.c#L120
The same:
if (detected_class == thruth_class) delta = 1-p > 0
if (detected_class != thruth_class) delta = −p < 0

where is p = logistic_activation(x) = output[index + stride*n]

In general, there are two types of classification

multi-label classification - each bounded box (each anchor) can have several classes. And in total there are in the neural model >= 1 classes. There is used Binary cross-entropy with Logistic activation (sigmoid). Is used in Yolo v3
multi-class classification - each bounded box (each anchor) can have only one classes. And in total there are in the neural model >= 1 classes. There is used Categorical cross-entropy with Softmax activation. Is used in Yolo v2

So:

There is used Logistic activation (sigmoid) = 1./(1. + exp(-x)) so:
- probabilities of several classes can be ~=1 - so we can detect Cat, Dog in one box (multi-label)
- nonlinearity - this is a prerequisite for use in the Neural Networks - A. N. Gorban and D. C. Wunsch, "The General Approximation Theorem," 1998: http://scholarsmine.mst.edu/cgi/viewcontent.cgi?article=2908&context=ele_comeng_facwork
For the neural networks, our result states that the function of neuron activation must be nonlinear - and nothing else. Whatever this nonlinearity is, the network of connections can be constructed, and coefficients of linear connections between the neurons can be adjusted in such a way that the neural network will compute any continuous function from its input signals with any given accuracy.
- derivative is very simple = (1-x)*x
There is used Binary Classification, binary - means that we look at each class separately, and we consider each class as 2 classes (There is or There is no). So we use this formula: https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html
loss = −(y*log(p) + (1−y)*log(1−p))
- log - the natural log ln
- y - binary indicator (0 or 1) if class label c is the correct classification for observation o
- p - predicted probability observation o is of class c

So:

if (detected_class == thruth_class) loss = −log(p)
if (detected_class != thruth_class) loss = −log(1−p)

Where is p = logistic_activation(x), this is output[index + stride*n] in the yolo_layer.c source code.
And we should minimize cost: loss = −log(p) or loss = −log(1−p).

As said in the MXNET doc: https://gluon.mxnet.io/chapter02_supervised-learning/logistic-regression-gluon.html

if (detected_class == thruth_class) we should maximize log(p), i.e. we should maximize p
if (detected_class != thruth_class) we should maximize (1−p)

AlexeyAB on 2 Jan 2019

👍10 🎉1

@AlexeyAB @pjreddie Sorry to bother you! It seems the gradient is alread calcuated and saved to l.delta during forward in yolo_layer.c. And l.delta seem don't use MSE. Is the l.cost is not the true loss? it's just used for showing? and when updating parameters, is no_obj loss obj_loss,coord_loss,calss_loss backward devided?

litingsjj on 5 Jun 2019

Thanks a lot!

longxianlei on 31 Jul 2019

Thanks for the awesome repo. My dataset contains objects which are exclusive . So instead of using sigmoid how can i use softmax for classification.

Thanks in advance.

riktimmondal on 8 Jan 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Just one CPU core is being used for object detection in darknet tiny yolo on Raspberry pi 3 B+

Vikalp-Reorder · 3Comments

Issue compiling Darknet with GPU

cadip92 · 3Comments

CUDA status Error: file: ./src/dark_cuda.c : () : line: 239 : build time: Apr 16 2020 - 16:43:48 CUDA Error: invalid argument: File exists darknet: ./src/utils.c:325: error: Assertion `0' failed.

SK124 · 3Comments

Why the bboxes have a coordinate offsets with python interface?

AaronYKing · 3Comments

Loading weights from darknet53.conv.74...Couldn't open file: darknet53.conv.74

bujingdexin · 3Comments