I keep getting the issue where the loss_rpn_box_reg is nan if x0 > x1 (in the [xmin, ymin, xmax, yamx] format).
For my dataset, xmax and ymax represent the top left corner and xmin and ymin represent width and height of the box. The training works fine when xmax > xmin, however, in the case where there is an object in the bottom left-hand corner the xmax < xmin. For example, one of the objects in the dataset has the following bounding box: [ 53., 89., 7., 226.], which is correct as I can use those coordinates to generate a bounding box on top of the corresponding image:

Any advice would be greatly appreciated.
This is expected, see https://github.com/pytorch/vision/issues/1120
Your boxes can't be degenerate. You should convert the x0, y0, w, h representation from your dataset into x0, y0, x1, y1 before returning the boxes.
Thank you for your reply. I'm sorry, but I don't quite understand how to convert the format of the boxes. For example, the bounding box I have above [ 53., 89., 7., 226.], how would that look in the [x0, y0, x1, y1]?
It looks that the coordinates are [x0, y0, w, h]. To convert it to [x0, y0, x1, y1], you can do something like
x1 = x0 + w
y1 = y0 + h
return [x0, y0, x1, y1]
Oh, actually, there might just be a problem with your annotations that the x0 and x1 are flipped in some cases.
If that's the case, then just always make x0_new = min(x0, x1) and x1_new = max(x0, x1) and the same for y
Thank you very much. I will make the adjustments based on your suggestions.
Sorry, I just had one other question. How would I use negative sample (images with not annotations)? After running through the dataloader, I get empty tensors for the bounding box.
How would I use negative sample (images with not annotations)? After running through the dataloader, I get empty tensors for the bounding box.
This is currently not supported. If you think this is something that would be helpful, please create a new task for tracking this feature request.
Most helpful comment
This is expected, see https://github.com/pytorch/vision/issues/1120
Your boxes can't be degenerate. You should convert the
x0, y0, w, hrepresentation from your dataset intox0, y0, x1, y1before returning the boxes.