Hello All,
In Yolov3 paper, it is clearly stated that the loss function is the same as the previous versions of YOLO, with an exception of the last component which uses cross entropy.
I have gone through the code and there is no sign for cross entropy i.e. pi(c)*log(pi^(c))
Does anyone have a clear understanding of the cross entropy in YOLOv3?
Thanks
@yasserkhalil93 Hi,
You can read it here: https://github.com/AlexeyAB/darknet/issues/1695#issuecomment-450995001
There is Binary cross-entropy loss = −(t*ln(y) + (1−t)*ln(1−y)) - we should minimize it
Also d(loss)/d(y) = loss_derivative = y-t : https://peterroelants.github.io/posts/cross-entropy-logistic/
y - probability [0 - 1]t - is the class correct 1 or not 0We do it here: https://github.com/pjreddie/darknet/blob/61c9d02ec461e30d55762ec7669d6a1d3c356fb2/src/yolo_layer.c#L120
The same:
t==1: i.e. if (detected_class == thruth_class) delta = -loss_derivative = -(y-t) = 1-yt==0: i.e. if (detected_class != thruth_class) delta = -loss_derivative = -(y-t) = -y
Free-form reasoning - in general, in the Yolo v3:
|
|
|
|---|---|
loss = −(y*ln(p) + (1−y)*ln(1−p)) and we should minimize itp - probability [0 - 1]y - is the class correct 1 or not 0loss = −ln(p) if(y==1) or loss = −ln(1−p) if(y==0), thoseif(y==1) then should be -ln(p)=0, those p=1if(y==0) then should be -ln(1-p)=0, those p=0then we can do it by maximizing p if(y==1) or maximizing 1-p if(y==0),
if(y==1) then we should maximize logistic_activation(x + delta) so delta > 0if(y==0) then we should minimize logistic_activation(x + delta) so delta < 0we do it here: https://github.com/pjreddie/darknet/blob/61c9d02ec461e30d55762ec7669d6a1d3c356fb2/src/yolo_layer.c#L120
The same:
delta = 1-p > 0delta = −p < 0where is p = logistic_activation(x) = output[index + stride*n]
In general, there are two types of classification
multi-label classification - each bounded box (each anchor) can have several classes. And in total there are in the neural model >= 1 classes. There is used Binary cross-entropy with Logistic activation (sigmoid). Is used in Yolo v3
multi-class classification - each bounded box (each anchor) can have only one classes. And in total there are in the neural model >= 1 classes. There is used Categorical cross-entropy with Softmax activation. Is used in Yolo v2
There is used Binary cross-entropy with Logistic activation (sigmoid) for multi-label classification in the Yolo v3, so each bonded box (each anchor) can have several classes. For example, one bounded box can be Animal, Cat or Truck, Car. Or even Cat, Dog if they are close to each other.
So:
There is used Logistic activation (sigmoid) = 1./(1. + exp(-x)) so:
~=1 - so we can detect Cat, Dog in one box (multi-label)For the neural networks, our result states that the function of neuron activation must be nonlinear - and nothing else. Whatever this nonlinearity is, the network of connections can be constructed, and coefficients of linear connections between the neurons can be adjusted in such a way that the neural network will compute any continuous function from its input signals with any given accuracy.
= (1-x)*xThere is used Binary Classification, binary - means that we look at each class separately, and we consider each class as 2 classes (There is or There is no). So we use this formula: https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html
loss = −(y*log(p) + (1−y)*log(1−p))
lnSo:
loss = −log(p) loss = −log(1−p) Where is p = logistic_activation(x), this is output[index + stride*n] in the yolo_layer.c source code.
And we should minimize cost: loss = −log(p) or loss = −log(1−p).
As said in the MXNET doc: https://gluon.mxnet.io/chapter02_supervised-learning/logistic-regression-gluon.html
log(p), i.e. we should maximize p(1−p)
https://peterroelants.github.io/posts/cross-entropy-logistic/
https://en.wikipedia.org/wiki/Cross_entropy#Cross-entropy_error_function_and_logistic_regression
https://gluon.mxnet.io/chapter02_supervised-learning/logistic-regression-gluon.html
https://ml-cheatsheet.readthedocs.io/en/latest/logistic_regression.html
https://en.wikipedia.org/wiki/Logistic_regression
@AlexeyAB @pjreddie Sorry to bother you! It seems the gradient is alread calcuated and saved to l.delta during forward in yolo_layer.c. And l.delta seem don't use MSE. Is the l.cost is not the true loss? it's just used for showing? and when updating parameters, is no_obj loss obj_loss,coord_loss,calss_loss backward devided?
Thanks a lot!
Thanks for the awesome repo. My dataset contains objects which are exclusive . So instead of using sigmoid how can i use softmax for classification.
Thanks in advance.
Most helpful comment
@yasserkhalil93 Hi,
You can read it here: https://github.com/AlexeyAB/darknet/issues/1695#issuecomment-450995001
There is Binary cross-entropy
loss = −(t*ln(y) + (1−t)*ln(1−y))- we should minimize itAlso
d(loss)/d(y) = loss_derivative = y-t: https://peterroelants.github.io/posts/cross-entropy-logistic/y- probability [0 - 1]t- is the class correct1or not0We do it here: https://github.com/pjreddie/darknet/blob/61c9d02ec461e30d55762ec7669d6a1d3c356fb2/src/yolo_layer.c#L120
The same:
t==1: i.e. if (detected_class == thruth_class)delta = -loss_derivative = -(y-t) = 1-yt==0: i.e. if (detected_class != thruth_class)delta = -loss_derivative = -(y-t) = -yFree-form reasoning - in general, in the Yolo v3:
|
|
|
|---|---|
loss = −(y*ln(p) + (1−y)*ln(1−p))and we should minimize itp- probability [0 - 1]y- is the class correct1or not0loss = −ln(p)if(y==1) orloss = −ln(1−p)if(y==0), thoseif(y==1)then should be-ln(p)=0, thosep=1if(y==0)then should be-ln(1-p)=0, thosep=0then we can do it by maximizing
pif(y==1) or maximizing1-pif(y==0),if(y==1)then we should maximizelogistic_activation(x + delta)sodelta > 0if(y==0)then we should minimizelogistic_activation(x + delta)sodelta < 0we do it here: https://github.com/pjreddie/darknet/blob/61c9d02ec461e30d55762ec7669d6a1d3c356fb2/src/yolo_layer.c#L120
The same:
delta = 1-p> 0delta = −p< 0where is
p = logistic_activation(x) = output[index + stride*n]In general, there are two types of classification
multi-label classification - each bounded box (each anchor) can have several classes. And in total there are in the neural model
>= 1 classes. There is used Binary cross-entropy with Logistic activation (sigmoid). Is used in Yolo v3multi-class classification - each bounded box (each anchor) can have only one classes. And in total there are in the neural model
>= 1 classes. There is used Categorical cross-entropy with Softmax activation. Is used in Yolo v2There is used Binary cross-entropy with Logistic activation (sigmoid) for multi-label classification in the Yolo v3, so each bonded box (each anchor) can have several classes. For example, one bounded box can be
Animal, CatorTruck, Car. Or evenCat, Dogif they are close to each other.So:
There is used Logistic activation (sigmoid)
= 1./(1. + exp(-x))so:~=1- so we can detectCat, Dogin one box (multi-label)= (1-x)*xThere is used Binary Classification, binary - means that we look at each class separately, and we consider each class as 2 classes (
There isorThere is no). So we use this formula: https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.htmlloss = −(y*log(p) + (1−y)*log(1−p))lnSo:
loss = −log(p)loss = −log(1−p)Where is
p = logistic_activation(x), this isoutput[index + stride*n]in theyolo_layer.csource code.And we should minimize cost:
loss = −log(p)orloss = −log(1−p).As said in the MXNET doc: https://gluon.mxnet.io/chapter02_supervised-learning/logistic-regression-gluon.html
log(p), i.e. we should maximizep(1−p)https://peterroelants.github.io/posts/cross-entropy-logistic/
https://en.wikipedia.org/wiki/Cross_entropy#Cross-entropy_error_function_and_logistic_regression
https://gluon.mxnet.io/chapter02_supervised-learning/logistic-regression-gluon.html
https://ml-cheatsheet.readthedocs.io/en/latest/logistic_regression.html
https://en.wikipedia.org/wiki/Logistic_regression