Darknet: Cross Entropy for YOLOv3

Created on 2 Jan 2019  Â·  4Comments  Â·  Source: pjreddie/darknet

Hello All,

In Yolov3 paper, it is clearly stated that the loss function is the same as the previous versions of YOLO, with an exception of the last component which uses cross entropy.

I have gone through the code and there is no sign for cross entropy i.e. pi(c)*log(pi^(c))

Does anyone have a clear understanding of the cross entropy in YOLOv3?

Thanks

Most helpful comment

@yasserkhalil93 Hi,

You can read it here: https://github.com/AlexeyAB/darknet/issues/1695#issuecomment-450995001

There is Binary cross-entropy loss = −(t*ln(y) + (1−t)*ln(1−y)) - we should minimize it
Also d(loss)/d(y) = loss_derivative = y-t : https://peterroelants.github.io/posts/cross-entropy-logistic/

  • y - probability [0 - 1]
  • t - is the class correct 1 or not 0

We do it here: https://github.com/pjreddie/darknet/blob/61c9d02ec461e30d55762ec7669d6a1d3c356fb2/src/yolo_layer.c#L120
The same:

  • t==1: i.e. if (detected_class == thruth_class) delta = -loss_derivative = -(y-t) = 1-y
  • t==0: i.e. if (detected_class != thruth_class) delta = -loss_derivative = -(y-t) = -y

image


Free-form reasoning - in general, in the Yolo v3:

| image |   image |
|---|---|

  • we use Binary cross-entropy for multi-label classification: loss = −(y*ln(p) + (1−y)*ln(1−p)) and we should minimize it

    • p - probability [0 - 1]

    • y - is the class correct 1 or not 0

  • so we should minimize cost: loss = −ln(p) if(y==1) or loss = −ln(1−p) if(y==0), those

    • if(y==1) then should be -ln(p)=0, those p=1

    • if(y==0) then should be -ln(1-p)=0, those p=0

  • then we can do it by maximizing p if(y==1) or maximizing 1-p if(y==0),

    • if(y==1) then we should maximize logistic_activation(x + delta) so delta > 0
    • if(y==0) then we should minimize logistic_activation(x + delta) so delta < 0
  • we do it here: https://github.com/pjreddie/darknet/blob/61c9d02ec461e30d55762ec7669d6a1d3c356fb2/src/yolo_layer.c#L120
    The same:

  • if (detected_class == thruth_class) delta = 1-p > 0
  • if (detected_class != thruth_class) delta = −p < 0

where is p = logistic_activation(x) = output[index + stride*n]


In general, there are two types of classification

  • multi-label classification - each bounded box (each anchor) can have several classes. And in total there are in the neural model >= 1 classes. There is used Binary cross-entropy with Logistic activation (sigmoid). Is used in Yolo v3

  • multi-class classification - each bounded box (each anchor) can have only one classes. And in total there are in the neural model >= 1 classes. There is used Categorical cross-entropy with Softmax activation. Is used in Yolo v2


There is used Binary cross-entropy with Logistic activation (sigmoid) for multi-label classification in the Yolo v3, so each bonded box (each anchor) can have several classes. For example, one bounded box can be Animal, Cat or Truck, Car. Or even Cat, Dog if they are close to each other.

So:

  1. There is used Logistic activation (sigmoid) = 1./(1. + exp(-x)) so:

    For the neural networks, our result states that the function of neuron activation must be nonlinear - and nothing else. Whatever this nonlinearity is, the network of connections can be constructed, and coefficients of linear connections between the neurons can be adjusted in such a way that the neural network will compute any continuous function from its input signals with any given accuracy.

    • derivative is very simple = (1-x)*x
  2. There is used Binary Classification, binary - means that we look at each class separately, and we consider each class as 2 classes (There is or There is no). So we use this formula: https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html
    loss = −(y*log(p) + (1−y)*log(1−p))

    • log - the natural log ln
    • y - binary indicator (0 or 1) if class label c is the correct classification for observation o
    • p - predicted probability observation o is of class c

So:

  • if (detected_class == thruth_class) loss = −log(p)
  • if (detected_class != thruth_class) loss = −log(1−p)

Where is p = logistic_activation(x), this is output[index + stride*n] in the yolo_layer.c source code.
And we should minimize cost: loss = −log(p) or loss = −log(1−p).

As said in the MXNET doc: https://gluon.mxnet.io/chapter02_supervised-learning/logistic-regression-gluon.html

  • if (detected_class == thruth_class) we should maximize log(p), i.e. we should maximize p
  • if (detected_class != thruth_class) we should maximize (1−p)

image

https://peterroelants.github.io/posts/cross-entropy-logistic/
https://en.wikipedia.org/wiki/Cross_entropy#Cross-entropy_error_function_and_logistic_regression
https://gluon.mxnet.io/chapter02_supervised-learning/logistic-regression-gluon.html
https://ml-cheatsheet.readthedocs.io/en/latest/logistic_regression.html
https://en.wikipedia.org/wiki/Logistic_regression

All 4 comments

@yasserkhalil93 Hi,

You can read it here: https://github.com/AlexeyAB/darknet/issues/1695#issuecomment-450995001

There is Binary cross-entropy loss = −(t*ln(y) + (1−t)*ln(1−y)) - we should minimize it
Also d(loss)/d(y) = loss_derivative = y-t : https://peterroelants.github.io/posts/cross-entropy-logistic/

  • y - probability [0 - 1]
  • t - is the class correct 1 or not 0

We do it here: https://github.com/pjreddie/darknet/blob/61c9d02ec461e30d55762ec7669d6a1d3c356fb2/src/yolo_layer.c#L120
The same:

  • t==1: i.e. if (detected_class == thruth_class) delta = -loss_derivative = -(y-t) = 1-y
  • t==0: i.e. if (detected_class != thruth_class) delta = -loss_derivative = -(y-t) = -y

image


Free-form reasoning - in general, in the Yolo v3:

| image |   image |
|---|---|

  • we use Binary cross-entropy for multi-label classification: loss = −(y*ln(p) + (1−y)*ln(1−p)) and we should minimize it

    • p - probability [0 - 1]

    • y - is the class correct 1 or not 0

  • so we should minimize cost: loss = −ln(p) if(y==1) or loss = −ln(1−p) if(y==0), those

    • if(y==1) then should be -ln(p)=0, those p=1

    • if(y==0) then should be -ln(1-p)=0, those p=0

  • then we can do it by maximizing p if(y==1) or maximizing 1-p if(y==0),

    • if(y==1) then we should maximize logistic_activation(x + delta) so delta > 0
    • if(y==0) then we should minimize logistic_activation(x + delta) so delta < 0
  • we do it here: https://github.com/pjreddie/darknet/blob/61c9d02ec461e30d55762ec7669d6a1d3c356fb2/src/yolo_layer.c#L120
    The same:

  • if (detected_class == thruth_class) delta = 1-p > 0
  • if (detected_class != thruth_class) delta = −p < 0

where is p = logistic_activation(x) = output[index + stride*n]


In general, there are two types of classification

  • multi-label classification - each bounded box (each anchor) can have several classes. And in total there are in the neural model >= 1 classes. There is used Binary cross-entropy with Logistic activation (sigmoid). Is used in Yolo v3

  • multi-class classification - each bounded box (each anchor) can have only one classes. And in total there are in the neural model >= 1 classes. There is used Categorical cross-entropy with Softmax activation. Is used in Yolo v2


There is used Binary cross-entropy with Logistic activation (sigmoid) for multi-label classification in the Yolo v3, so each bonded box (each anchor) can have several classes. For example, one bounded box can be Animal, Cat or Truck, Car. Or even Cat, Dog if they are close to each other.

So:

  1. There is used Logistic activation (sigmoid) = 1./(1. + exp(-x)) so:

    For the neural networks, our result states that the function of neuron activation must be nonlinear - and nothing else. Whatever this nonlinearity is, the network of connections can be constructed, and coefficients of linear connections between the neurons can be adjusted in such a way that the neural network will compute any continuous function from its input signals with any given accuracy.

    • derivative is very simple = (1-x)*x
  2. There is used Binary Classification, binary - means that we look at each class separately, and we consider each class as 2 classes (There is or There is no). So we use this formula: https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html
    loss = −(y*log(p) + (1−y)*log(1−p))

    • log - the natural log ln
    • y - binary indicator (0 or 1) if class label c is the correct classification for observation o
    • p - predicted probability observation o is of class c

So:

  • if (detected_class == thruth_class) loss = −log(p)
  • if (detected_class != thruth_class) loss = −log(1−p)

Where is p = logistic_activation(x), this is output[index + stride*n] in the yolo_layer.c source code.
And we should minimize cost: loss = −log(p) or loss = −log(1−p).

As said in the MXNET doc: https://gluon.mxnet.io/chapter02_supervised-learning/logistic-regression-gluon.html

  • if (detected_class == thruth_class) we should maximize log(p), i.e. we should maximize p
  • if (detected_class != thruth_class) we should maximize (1−p)

image

https://peterroelants.github.io/posts/cross-entropy-logistic/
https://en.wikipedia.org/wiki/Cross_entropy#Cross-entropy_error_function_and_logistic_regression
https://gluon.mxnet.io/chapter02_supervised-learning/logistic-regression-gluon.html
https://ml-cheatsheet.readthedocs.io/en/latest/logistic_regression.html
https://en.wikipedia.org/wiki/Logistic_regression

@AlexeyAB @pjreddie Sorry to bother you! It seems the gradient is alread calcuated and saved to l.delta during forward in yolo_layer.c. And l.delta seem don't use MSE. Is the l.cost is not the true loss? it's just used for showing? and when updating parameters, is no_obj loss obj_loss,coord_loss,calss_loss backward devided?

Thanks a lot!

Thanks for the awesome repo. My dataset contains objects which are exclusive . So instead of using sigmoid how can i use softmax for classification.

Thanks in advance.

Was this page helpful?
0 / 5 - 0 ratings