In utils/utils.py, line 610:
c = x[:, 5:6] * (0 if agnostic else max_wh)
According to the code before this line, we know that x[:, 5:6] represents the class indexes of bboxes, when agnostic is False, this line is equal to: c = x[:, 5:6] * max_wh, so why multiply the class index by max_wh ? What's the meaning of c ?
@seekFire C is an offset according to the different class of the predictive box, so that the predictive box coordinates of each class will be separated, and the prediction boxes of different class will not be removed when we do NMS. max_ Wh is a coefficient that large enough to separate the box coordinates of each class.
@Laughing-q Oh, I see... Just as you say, this operation can avoid the situation that two bboxes belonging to different categories are highly overlapping!
@seekFire, shortly, applying the offset is a trick to separate in the feature space the boxes of each class given that the NMS in PyTorch can only be done using one confidence score -> https://pytorch.org/docs/stable/torchvision/ops.html#torchvision.ops.nms
This is the same for the off-the-shelf TensorFlow NMS op if I remember well.
In other frameworks, one of them being CoreML, pre-packaged NMS operations can handle multi-class: https://apple.github.io/coremltools/coremlspecification/sections/NeuralNetwork.html#nonmaximumsuppressionlayerparams
@dlawrences I learnt more from your answer, Thank you very much!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
@Laughing-q Oh, I see... Just as you say, this operation can avoid the situation that two bboxes belonging to different categories are highly overlapping!