Vision: bounding boxes in torchvision.utils

Created on 6 Aug 2020 · 28Comments · Source: pytorch/vision

🚀 Feature

I'd like to easily be able to draw a bounding box onto an image represented as a torch tensor.

Motivation

Using YOLO, I get a bunch of bounding boxes in an image. I want to be able to easily draw those onto a torch tensor (created via torchvision.transforms.ToTensor). This seems like a reasonable request because other bbox utilities such as NMS exist in torchvision.

Pitch

tensor_image = torch.tensor(...)
new_img = torchvision.utils.draw_bounding_box(
    tensor_image,
    [x_min, y_min, x_max, y_max],
)

Alternatives

I think the following works right now, but the torch.Tensor > PIL.Image > torch.Tensor conversion for a single operation per image makes me feel uncomfortable due to efficiency.

# entirely untested!

tensor_image = torch.tensor(...)
def draw_bounding_box(tensor, bbox, **kwargs):  # TODO: don't use a kwargs dict
    # adapted from https://pytorch.org/docs/stable/_modules/torchvision/utils.html#save_image
    from PIL import Image
    # TODO: none of the kwargs to make_grid are in this scope
    grid = make_grid(tensor, nrow=nrow, padding=padding, pad_value=pad_value,
                     normalize=normalize, range=range, scale_each=scale_each)
    # Add 0.5 after unnormalizing to [0, 255] to round to nearest integer
    ndarr = grid.mul(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).to('cpu', torch.uint8).numpy()

    # TODO: convert bbox to the appropriate type
    return cv2.rectangle(ndarr, bbox)  # use PIL if opencv-python isn't a dependency

Additional context

See cv2.rectangle (source) and PIL.ImageDraw.rectangle (source).

feature request help wanted utils object detection

Source

sumanthratna

Most helpful comment

Great Then, I probably need a fresh PR for this. Since I have to rewrite the logic :-) I will cite @sumanthratna in the code

oke-aditya on 10 Oct 2020

👍2 🚀1

All 28 comments

Hey @sumanthratna this sounds like a reasonable request. Lets wait for @fmassa to approve this, but I think we can move forward with this. Would you like to send a PR?

pmeier on 6 Aug 2020

Would you like to send a PR?

Sure @pmeier!

Here are the solutions I'm thinking of:

converting the torch tensor to a numpy array or PIL Image (see "Alternatives")
- pro: This will make drawing the actual bbox a job for either cv2 or PIL as opposed to torchvision, which means less maintenance work for you
- con: if I use a for-loop to call this new torchvision function multiple times on the same image, converting to a np array or PIL Image for each iteration induces unnecessary torch.Tensor > np.array ( > PIL.Image) overhead
porting cv2.rectangle to torch
- pro: for-loops won't make this function particularly inefficient

What do you think is the best option? I'm personally leaning towards the second option.

sumanthratna on 6 Aug 2020

Hi,

We do want to have functions for drawing bounding boxes in images, but we need to come up with requirements and API. If we plot a rectangle, we would probably also want to support plotting text, selecting colors, line width, etc. Additionally, should the API support a single box at a time or a batch of boxes?
All those questions I'm not yet clear about, but would be good to be thought about and discussed before we start implementing anything.

torchvision currently don't depend on OpenCV, and we would like to keep it that way. So the box drawing function would need to depend on either: PIL, matplotlib or be natively implemented in torchvision

I would not be very concerned about the overhead of this function (converting a Tensor to an ndarray is cheap, and the same for a PIL.Image), plus I wouldn't expect this to be on a critical path so it's ok if it's not super efficient.

fmassa on 10 Aug 2020

We do want to have functions for drawing bounding boxes in images, but we need to come up with requirements and API

If we only go for drawing boxes, I think this can be done straight forward in torchvision. We basically only need to calculate slices for the current box and just set the pixels to the given color.

we would probably also want to support plotting text

This will be a problem in torchvision. If we also want this I suggest we forget doing this natively and use ImageDraw from PIL.

Additionally, should the API support a single box at a time or a batch of boxes?

I'm voting for multiple boxes to avoid a color conflict between boxes. If we only plot one box at a time either the user has to manually specify a new color for each box or we need to somehow track which colors we have already used.

pmeier on 10 Aug 2020

@fmassa those are good questions that I didn't consider, and I definitely agree with what @pmeier has suggested.

Additionally, should the API support a single box at a time or a batch of boxes?

I'm voting for multiple boxes to avoid a color conflict between boxes. If we only plot one box at a time either the user has to manually specify a new color for each box or we need to somehow track which colors we have already used.

Originally, I thought that we should leave it up to users to decide the color, to avoid any confusing behavior. However, I do think that batch drawing might make for cleaner code on the user-end. All in all, I think we should go with batch drawing. I'm still unsure if we want to deal with color conflicts, though. Sometimes a user will want to draw multiple boxes with the same class (see the image below, source):

Here's what I've written as a workaround until this gets merged (I haven't tested it):

def draw_bounding_box(
    tensor,
    bbox,
    nrow=8,
    padding=2,
    normalize=False,
    range=None,
    scale_each=False,
    pad_value=0,
    fill=None,
    outline=None,
    width=1,
):
    from torchvision.utils import make_grid
    from PIL import Image, ImageDraw
    grid = make_grid(tensor, nrow=nrow, padding=padding, pad_value=pad_value,
                     normalize=normalize, range=range, scale_each=scale_each)
    # Add 0.5 after unnormalizing to [0, 255] to round to nearest integer
    ndarr = grid.mul(255).add_(0.5).clamp_(0, 255).permute(
        1, 2, 0).to('cpu', torch.uint8).numpy()
    im = Image.fromarray(ndarr)

    ImageDraw.Draw(im).rectangle(bbox, fill=fill, outline=outline, width=width)
    from numpy import array as to_numpy_array
    return torch.from_numpy(to_numpy_array(im))

It should be simple to convert this to a batch-bbox function.

sumanthratna on 10 Aug 2020

@sumanthratna I don't think we should couple this with make_grid. Can't we just have draw_bounding_box and if you need a grid of images pass the output to make_grid afterwards?

pmeier on 11 Aug 2020

That seems reasonable. I initially just copied torchvision.utils.save_image without paying attention to what it does :)

Is this better? We might not need to unnormalize.

def draw_bounding_box(
    tensor,
    bbox,
    fill=None,
    outline=None,
    width=1,
):
    from PIL import Image, ImageDraw
    # Add 0.5 after unnormalizing to [0, 255] to round to nearest integer
    ndarr = tensor.mul(255).add_(0.5).clamp_(0, 255).permute(
        1, 2, 0).to('cpu', torch.uint8).numpy()
    im = Image.fromarray(ndarr)

    ImageDraw.Draw(im).rectangle(bbox, fill=fill, outline=outline, width=width)
    from numpy import array as to_numpy_array
    return torch.from_numpy(to_numpy_array(im))

sumanthratna on 11 Aug 2020

I don't think we should have the fill parameter.
I would rename outline to color to make it more expressive.
Right now the function is only taking a single bounding box. We need to extend this, but I'm not sure how make this intuitively. Our solution should not only support multiple boxes, but also multiple boxes of the same class.

pmeier on 12 Aug 2020

Why not? If PIL.ImageDraw.rectangle accepts the fill kwarg, why would we not want to use it? IMHO we don't need to limit the functionality of Pillow
This makes sense if we remove the fill kwarg.
Definitely. It seems that if we want to deal with classes, a dict of {detection_class: [bbox1, bbox2, ...]} seems the most intuitive way to deal with this. Alternatively, we could just make our bbox arg a sequence of bboxes, and tell users to call draw_bounding_box multiple times (1 call for each class).

sumanthratna on 12 Aug 2020

Why not?

As the name implies, fill will fill the inside of the rectangle and thus blocking everything within the bounding box from view. I can't think of a scenario where someone wants that.

pmeier on 12 Aug 2020

Why not?

As the name implies, fill will fill the inside of the rectangle and thus blocking everything within the bounding box from view. I can't think of a scenario where someone wants that.

In my case, I want to be able to fill in bounding boxes because I want to be able to treat bounding boxes as segmentation masks. That is, by filling in boxes onto a black background, I can use cv2.findContours to get the bounding boxes, and locate their centers. I will admit three things about this use-case:

it's kind of hacky (storing bounding boxes info in a mask image instead of a dataframe or list of bboxes sounds a little ugly to me)
I doubt many people will have this use-case
I could just use numpy and cv2 and torch.from_numpy/torch.tensor

Looking at these three points against this use-case, I suppose we can remove fill (mostly due to the last point). Is this a little better? In this definition, the bboxes arg is a sequence of boxes since we haven't decided on that yet.

(off the top of my head; untested)

def draw_bounding_box(
    tensor,
    bboxes,
    color=None,
    width=1,
):
    from PIL import Image, ImageDraw
    # Add 0.5 after unnormalizing to [0, 255] to round to nearest integer
    ndarr = tensor.mul(255).add_(0.5).clamp_(0, 255).permute(
        1, 2, 0).to('cpu', torch.uint8).numpy()
    im = Image.fromarray(ndarr)
    draw = ImageDraw.Draw(im)

    if len(color) == 3 and isinstance(color[0], int):
        # bboxes is a sequence and color is a single (R, G, B) color
        # TODO: check if Pillow accepts any other formats for the color kwarg
        # TODO: this if-statement looks ugly
        colors = [color]*len(bboxes)
    else:
        # bboxes and color are both sequences of the same len
        colors = color
    for bbox, color in zip(bboxes, colors):
        draw.rectangle(bbox, outline=color, width=width)
    from numpy import array as to_numpy_array
    return torch.from_numpy(to_numpy_array(im))

sumanthratna on 12 Aug 2020

Is this a little better?

Yeah, I think your use-case is an edge-case that justifies a custom implementation on the user side.

Two more remarks:

I wonder if we should split this into two functions. We could have draw_bounding_box(image, bbox, color=None, width=None) that draws only a single bbox with the given color. In addition draw_bounding_boxes(...) could handle multiple bboxes per class and later the class tags. Thoughts?
We need to supply default colors for the bounding boxes, since most of the time the user will not care about the actual color. I suggest we either go with the matplotlib line-color palette or some other elaborate color palette.

pmeier on 13 Aug 2020

To be honest, I can't think of any cases where a user would only want to draw a single bbox. Further, as a user, I wouldn't mind using draw_bounding_boxes and passing in a list of a single bbox.
Both the matplotlib and elaborate palettes look okay. I'm using torchvision for nuclei detection in a variety of tissue types, which have a pink/purple color. Both matplotlib and elaborate have their limitations in this use-case, but I don't think we'll find any better palettes—I'll just use my own palette.

One thing to discuss about palettes is how we want to decide colors when a user wants to draw more than 10 classes. I'm tempted to make draw_bounding_boxes only accept one color, and force users to call it 1 time per class, but obviously that's not ideal. A (not great) solution would be to assert (or raise an Error) that the user only draw 10 classes at once, and suggest they use custom palettes.

sumanthratna on 13 Aug 2020

You are right, I didn't think this through properly. So we basically have two cases:
1. The user passes the boxes as sequence. In that case we treat each entry as separate class and thus a box of a different color.
2. The user passes a dictionary with a str key and a list of bounding boxes as value. Here we treat the sequence as one class and thus of the same color.
Do we agree on that?

The color palettes I proposed should only act as a sensible default. If a user needs special colors, he will need to define them manually. In the spirit of 1., I think the color kwarg should also either be a sequence or a dictionary. Thoughts?

A (not great) solution would be to assert (or raise an Error) that the user only draw 10 classes at once, and suggest they use custom palettes.

Why not? It sounds pretty sensible to me to raise an error if our defaults are not sufficient for the use case. Do you think users will hit this case often?

pmeier on 14 Aug 2020

Do we agree on that?

Yup!

A (not great) solution would be to assert (or raise an Error) that the user only draw 10 classes at once, and suggest they use custom palettes.

Why not? It sounds pretty sensible to me to raise an error if our defaults are not sufficient for the use case. Do you think users will hit this case often?

I'm not a fan of this idea because it feels like 10 is an arbitrary length of the palette (even though it's really not). I think these are our options:

cycle back to the beginning of the palette and display a Warning. not a fan of this
raise an Error and tell users to provide a custom palette with sufficient length. better than the other options
just randomly generate colors (with a seed) or use some actual logic to create colors that are as distinct as possible and not necessarily random. meh, not a huge fan and this adds unnecessary complexity

What do you think? I'm fine with raising an error.

I'm also wondering if we should scrap the color kwarg in favor of the dict of classes. IMO this makes complete sense: {object_class: {bboxes: [bbox1, bbox2...], color: rgb_color}}. The only downside is that it's not really intuitive, as one wouldn't expect such a complex structure to represent a bunch of bounding boxes.

sumanthratna on 14 Aug 2020

What do you think? I'm fine with raising an error.

From what we have now, I think this is the best option:

Cycling back is very dangerous since different classes might be mixed up by an observer. This is especially true if we do not label the bboxes.
Randomly generating colors might result in colors that are practically indistinguishable and thus having the same problem as above.

The color palette given in my second link has up to 12 colors. I think if someone really needs more, this is out of scope for us. I even think having 12 classes of bboxes at the same time is confusing enough that most people won't go for that.

I'm also wondering if we should scrap the color kwarg in favor of the dict of classes. [...]

> The only downside is that it's not really intuitive
Now we are stuck at manually defining the colors again. IMO color should have a sensible default, since most (I believe > 99% of people) don't care about the colors if they can be easily distinguished from another.

pmeier on 14 Aug 2020

What do you think? I'm fine with raising an error.

The color palette given in my second link has up to 12 colors. I think if someone really needs more, this is out of scope for us. I even think having 12 classes of bboxes at the same time is confusing enough that most people won't go for that.

That's a valid point; let's go with raising an error if the number of classes exceeds the length of the palette and there are no custom colors.

I'm also wondering if we should scrap the color kwarg in favor of the dict of classes. [...]

Now we are stuck at manually defining the colors again. IMO color should have a sensible default, since most (I believe > 99% of people) don't care about the colors if they can be easily distinguished from another.

I agree with your statement about most people not needing custom colors. However, I'm unsure why the dict-with-colors forces** us back into manually defining colors. If a user provides colors in the dict, then hooray, we use those. Otherwise, we simply choose the classes' colors based on the palettes you suggested earlier. Unfortunately, dicts don't preserve order, but we can simply draw classes in alphabetical order (so we have deterministic color-class combos) or use OrderedDict.

To me, using a dict for classes and bboxes but not colors doesn't totally make sense. If we want to separate arguments, I think we should have 3 args: classes, bboxes, and colors with assert len(classes) == len(bboxes) and assert len(colors) <= len(classes) if colors is not None (the asserts are pseudocode—not copy-pasteable).

sumanthratna on 14 Aug 2020

The problem I see is that IMO most people will not touch the colors. Thus, if we go for your approach

{object_class: {bboxes: [bbox1, bbox2...], color: rgb_color}}

you are forcing the user to handle the colors manually. If we go for the bbox signature we previously agreed on, the user has not to fiddle with the colors at all.

I think its more convenient to pass

bboxes = {"class0": (bbox0, bbox1), "class1": (bbox2,)}

and maybe having to specify

colors = {"class0": "red", "class1": "blue"}

instead of always passing

colors = {"class0": {"bboxes": (bbox0, bbox1), "color": "red"}, "class1": {"bboxes": (bbox2,), "color": "blue"}}

pmeier on 14 Aug 2020

That makes sense. I think we should go with a dict for colors then, because dicts don't preserve order so we can't exactly "align" a dict of {class: bbox} with a list of colors. I think we've agreed on a signature that looks something like this:

def draw_bounding_boxes(tensor, bboxes, colors=None, width=1):
    """Draws the bounding boxes onto an image represented as a torch tensor.

    tensor: torch.Tensor
    bboxes: dict of object_class: list_of_bboxes_for_class
    colors: dict of object_class: color_of_bboxes_for_class, optional
    """
    pass

The docstring is pretty bad, but @pmeier does this signature look okay?

sumanthratna on 14 Aug 2020

Don't forget the option to pass the bboxes as a sequence.

BBox = Tuple[int, int, int, int]
Color = Tuple[int, int, int]
DEFAULT_COLORS: Sequence[Color]


def draw_bounding_boxes(
    image: torch.Tensor,
    bboxes: Union[Sequence[BBox], Dict[str, Sequence[BBox]]],
    colors: Optional[Dict[str, Color]] = None,
    width: int = 1,
) -> torch.Tensor:
    pass

I think the default behavior of colors should be

if bboxes is a sequence align it with DEFAULT_COLORS
if bboxes is a dict, sort it alphabetically along the classes (as you proposed earlier) and align this with DEFAULT_COLORS

pmeier on 14 Aug 2020

I like that, but I think we should add an if-statement or assert to make sure users don't pass in a sequence for bboxes and a custom dict for colors to avoid any potential confusion. Unfortunately, I don't know if torch.jit will be able to treat this if-statement as part of the type-definition when it's type-checking.

sumanthratna on 14 Aug 2020

Since we agree on the signature lets move on to the PR for now! If we hit any blocks down the road, lets solve them there. Just ping me when you are ready.

pmeier on 15 Aug 2020

Great! I'm actually very busy until August 29, but I'll definitely try to get a draft PR in after then. Ping me if I don't do it by September 3!

sumanthratna on 15 Aug 2020

👍1

Oh—shouldn't we have a bool kwarg for whether class labels should be drawn on the bbox borders (see the image I sent earlier)? That would make the signature:

BBox = Tuple[int, int, int, int]
Color = Tuple[int, int, int]
DEFAULT_COLORS: Sequence[Color]


def draw_bounding_boxes(
    image: torch.Tensor,
    bboxes: Union[Sequence[BBox], Dict[str, Sequence[BBox]]],
    colors: Optional[Dict[str, Color]] = None,
    draw_labels: bool = False,
    width: int = 1,
) -> torch.Tensor:
    pass

sumanthratna on 16 Aug 2020

I think that is a fair point although I would go with None is default and than set it to True or False depending on the type of bboxes. Similar to colors, we should raise an error if bboxes is a sequence and draw_labels is True.

pmeier on 17 Aug 2020

👍1

Hey @pmeier and @sumanthratna
I have slightly different thoughts towards the API of bounding box implementations.

Allow user to pass Bboxes which are outputs of our detection models. This will make it easier to plot and process, as most probably people are using this function for detection models in torchvision
.
BBoxes can hence be a tensor, thus this utility will come in hand when we are using detection models.

Also then the user can probably use the recently added box_convert to convert boxes and pass if needed. I don't think we should do it internally. Tensor[N, 4] hence makes it quite simple and probably operations can be better.

This stays consistent with other utilities such as make_grid since they use Tensors. It would be odd to have this utlitiy as list / sequence. Also, I really don't think that the user might be using list/tuple as bounding boxes.

I'm slightly unsure for labels I think if we can again re-use detection models labels it would be nicer.
I think if again user passes the labels from torchvision output. We can probably assign the colors easily. Having a fixed color picker for a given color. If colors Dict for labels in not passed.
Hence colors can be

colors = {1: "red', 2: "blue"} # Note that 0 is background for torchvision models and cannot be passed.

Hence I guess the API can be

def draw_bounding_boxes(
    image: torch.Tensor,
    bboxes: Tensor[N, 4],
    labels: Tensor[N],
    colors: Dict[int, str]] = None,
    draw_labels: bool = False,
    width: int = 1,
) -> torch.Tensor:
    pass

I understand the point of the older API design. But this new one makes this utility useful. As this is a utility to use with torchvision, it should rather be useful to torchvision API. These are just small tweaks to take tensors instead of lists. Accept labels and keep colors to (int, str) dict to make it easier.

I'm still thinking about the text. Maybe allow the user to pass dict() of class label mapping? If passed we can draw_labels (as in put text as well).

This API is much simpler to use and makes it really intuitive to pass outputs of detection models itself.

oke-aditya on 7 Oct 2020

@oke-aditya Very good points! We didn't consider the detection API before, but I think you are right that draw_bounding_boxes should be compatible. You can go ahead and implement this.

pmeier on 10 Oct 2020

🚀1 👍1

Great Then, I probably need a fresh PR for this. Since I have to rewrite the logic :-) I will cite @sumanthratna in the code

oke-aditya on 10 Oct 2020

👍2 🚀1

Was this page helpful?

0 / 5 - 0 ratings