Hi, I found that in the joint training version, RPN's SmoothL1Loss Layer, you use
smooth_l1_loss_param { sigma: 3.0 }
And this setting of sigma will change the SmoothL1Loss function a little bit. Is there any reason to do that? Really appreciate if anyone could give me some references or explanations on it.
As sigma -> inf the loss approaches L1 (abs) loss. Setting sigma = 3, makes the transition point from quadratic to linear happen at |x| <= 1 / 3**2 (closer to the origin). The reason for doing this is because the RPN bbox regression targets are not normalized by their stdev (unlike in Fast R-CNN), because the statistics of the targets are changing constantly throughout learning. In a future update I may simply replace smooth L1 with (hard) L1 which I believe will likely work as well and be simpler (no sigma, etc.).
Hi guys, just to confirm, I've created this pdf from maple, while I was trying to get my head over the Smooth L1 loss, so by using sigma=3 the output of the Smooth L1 start to become like a normal L1 loss.
So this means that we could also use L1 loss? Also other algorithms also use the Smooth L1 and I think that they also have this non-normalized targets during training.
SmoothL1Loss (1).pdf
@leonardoaraujosantos I test sigma=1, p=[1, 2, 3] q=[1.1, 2, 3.3], SO the L1 distance = [0.1, 0, 0.3], L1Loss=0.4, smoothL1 loss = 0.05. In PDF smoothL1 loss = 0.08. I think the example is wrong.
import tensorflow as tf
deltas = [0.1, 0, 0.3]
sigma2 = 1
deltas_abs = tf.abs(deltas)
smoothL1_sign = tf.cast(tf.less(deltas_abs, 1.0/sigma2), tf.float32)
x = tf.reduce_sum(tf.square(deltas) * 0.5 * sigma2 * smoothL1_sign + \
(deltas_abs - 0.5 / sigma2) * tf.abs(smoothL1_sign - 1))
with tf.Session() as sess:
print sess.run(x)
Most helpful comment
As sigma -> inf the loss approaches L1 (abs) loss. Setting sigma = 3, makes the transition point from quadratic to linear happen at |x| <= 1 / 3**2 (closer to the origin). The reason for doing this is because the RPN bbox regression targets are not normalized by their stdev (unlike in Fast R-CNN), because the statistics of the targets are changing constantly throughout learning. In a future update I may simply replace smooth L1 with (hard) L1 which I believe will likely work as well and be simpler (no sigma, etc.).