The loss function in the original paper is defined as:
I've seen comments talking about how it's implemented differently / changed in darknet for v2 and v3, does someone have the mathematical formula for either / both?
see this page