Vision: Bounding Boxes Conversions

Created on 18 Sep 2020 · 4Comments · Source: pytorch/vision

🚀 Feature

Simple feature request, we have a lot of useful bounding box operations, in torchvision.ops.boxes.py.

Motivation

Most data format come in either these two, VOC and COCO for object detection.
If we can provide a utility to convert it, it might ease the process from the user's side, just like how IoU and NMS did.

Pitch

Provide 2 functions, (please name them better)

def xywh_from_xyxy(torch_tensor : Tensor[x1 y1 x2 y2]):
   # Add the 2 dimensions of (y2-y1), (x2 - x1) to x2 and y2.

def xyxy_from_xywh(torch_tensor : Tensor[x y w h]):
   # x2 = x + w
   # y2 = y + h
   # Replace these 2 to x2 and y2.

Alternatives

I am unsure of any, for now, we can leave this to the user as well, but this is just utility no forcing.

Additional context

These are very commonly used in most repositories. Supporting natively like NMS and IoU provides a standard.
Again this is a utility so not a big feature or addition which we need, might help to simplify detection models. I'm unsure in segmentation models.

I can most probably submit a PR for this if it is good to go.

cc @pmeier

ops object detection

Source

oke-aditya

Most helpful comment

@oke-aditya Go for it! Ping me when you are done.

pmeier on 27 Sep 2020

🎉1 👍1

All 4 comments

Hi,

I think this could be an useful addition. For reference, some related (but not exactly equal) functions can be found in DETR

fmassa on 25 Sep 2020

👍1

I saw the Detr implementation. It takes center x, center y, w, h and then converts to x1y1,x2 y2 (xmin, ymin, xmax, ymax) format.

So maybe we can add that as well.

x1y1,x2 y2 (xmin, ymin, xmax, ymax) is the format that models FRCNN and upcoming RetinaNet take, so maybe provide utilities that can covert the following format to this and vice-versa. I'm re-using the names from Detr for consistency.

1. box_cxcywh_to_xyxy
2. box_xywh_to_xyxy

And reverse utilities for same.

1. box_xyxy_to_cxcywh
2. box_xyxy_to_xywh

The idea is not to provide utility that might be redundant and not useful
E.g. box_cxcywh_to_xywh. This can be done with sequence of 2 operations and is not useful when giving input to detection models.

Function signature can be the same for all. This implementation differs from Detr implementation. But maintains consistency with torchvision.
E.g.

box_xywh_to_xyxy(boxes : Tensor):
    # Idea is to support all boxes and do processing in batch. Same as done in box_area
    boxes[:, 2] = boxes[:, 0] + boxes[:, 2]  # x + w
    boxes[:, 3] = boxes[:, 1] + boxes[:, 3]  # y + h
    return boxes

Let me know your thoughts and if some other common formats are to be adopted.

oke-aditya on 25 Sep 2020

@oke-aditya Go for it! Ping me when you are done.

pmeier on 27 Sep 2020

🎉1 👍1

Sure, will make a PR very soon on this. 👍

oke-aditya on 27 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings