I'd like to easily be able to draw a bounding box onto an image represented as a torch tensor.
Using YOLO, I get a bunch of bounding boxes in an image. I want to be able to easily draw those onto a torch tensor (created via torchvision.transforms.ToTensor). This seems like a reasonable request because other bbox utilities such as NMS exist in torchvision.
tensor_image = torch.tensor(...)
new_img = torchvision.utils.draw_bounding_box(
tensor_image,
[x_min, y_min, x_max, y_max],
)
I think the following works right now, but the torch.Tensor > PIL.Image > torch.Tensor conversion for a single operation per image makes me feel uncomfortable due to efficiency.
# entirely untested!
tensor_image = torch.tensor(...)
def draw_bounding_box(tensor, bbox, **kwargs): # TODO: don't use a kwargs dict
# adapted from https://pytorch.org/docs/stable/_modules/torchvision/utils.html#save_image
from PIL import Image
# TODO: none of the kwargs to make_grid are in this scope
grid = make_grid(tensor, nrow=nrow, padding=padding, pad_value=pad_value,
normalize=normalize, range=range, scale_each=scale_each)
# Add 0.5 after unnormalizing to [0, 255] to round to nearest integer
ndarr = grid.mul(255).add_(0.5).clamp_(0, 255).permute(1, 2, 0).to('cpu', torch.uint8).numpy()
# TODO: convert bbox to the appropriate type
return cv2.rectangle(ndarr, bbox) # use PIL if opencv-python isn't a dependency
See cv2.rectangle (source) and PIL.ImageDraw.rectangle (source).
Hey @sumanthratna this sounds like a reasonable request. Lets wait for @fmassa to approve this, but I think we can move forward with this. Would you like to send a PR?
Would you like to send a PR?
Sure @pmeier!
Here are the solutions I'm thinking of:
torch.Tensor > np.array ( > PIL.Image) overheadcv2.rectangle to torchWhat do you think is the best option? I'm personally leaning towards the second option.
Hi,
We do want to have functions for drawing bounding boxes in images, but we need to come up with requirements and API. If we plot a rectangle, we would probably also want to support plotting text, selecting colors, line width, etc. Additionally, should the API support a single box at a time or a batch of boxes?
All those questions I'm not yet clear about, but would be good to be thought about and discussed before we start implementing anything.
torchvision currently don't depend on OpenCV, and we would like to keep it that way. So the box drawing function would need to depend on either: PIL, matplotlib or be natively implemented in torchvision
I would not be very concerned about the overhead of this function (converting a Tensor to an ndarray is cheap, and the same for a PIL.Image), plus I wouldn't expect this to be on a critical path so it's ok if it's not super efficient.
We do want to have functions for drawing bounding boxes in images, but we need to come up with requirements and API
If we only go for drawing boxes, I think this can be done straight forward in torchvision. We basically only need to calculate slices for the current box and just set the pixels to the given color.
we would probably also want to support plotting text
This will be a problem in torchvision. If we also want this I suggest we forget doing this natively and use ImageDraw from PIL.
Additionally, should the API support a single box at a time or a batch of boxes?
I'm voting for multiple boxes to avoid a color conflict between boxes. If we only plot one box at a time either the user has to manually specify a new color for each box or we need to somehow track which colors we have already used.
@fmassa those are good questions that I didn't consider, and I definitely agree with what @pmeier has suggested.
Additionally, should the API support a single box at a time or a batch of boxes?
I'm voting for multiple boxes to avoid a color conflict between boxes. If we only plot one box at a time either the user has to manually specify a new color for each box or we need to somehow track which colors we have already used.
Originally, I thought that we should leave it up to users to decide the color, to avoid any confusing behavior. However, I do think that batch drawing might make for cleaner code on the user-end. All in all, I think we should go with batch drawing. I'm still unsure if we want to deal with color conflicts, though. Sometimes a user will want to draw multiple boxes with the same class (see the image below, source):

Here's what I've written as a workaround until this gets merged (I haven't tested it):
def draw_bounding_box(
tensor,
bbox,
nrow=8,
padding=2,
normalize=False,
range=None,
scale_each=False,
pad_value=0,
fill=None,
outline=None,
width=1,
):
from torchvision.utils import make_grid
from PIL import Image, ImageDraw
grid = make_grid(tensor, nrow=nrow, padding=padding, pad_value=pad_value,
normalize=normalize, range=range, scale_each=scale_each)
# Add 0.5 after unnormalizing to [0, 255] to round to nearest integer
ndarr = grid.mul(255).add_(0.5).clamp_(0, 255).permute(
1, 2, 0).to('cpu', torch.uint8).numpy()
im = Image.fromarray(ndarr)
ImageDraw.Draw(im).rectangle(bbox, fill=fill, outline=outline, width=width)
from numpy import array as to_numpy_array
return torch.from_numpy(to_numpy_array(im))
It should be simple to convert this to a batch-bbox function.
@sumanthratna I don't think we should couple this with make_grid. Can't we just have draw_bounding_box and if you need a grid of images pass the output to make_grid afterwards?
That seems reasonable. I initially just copied torchvision.utils.save_image without paying attention to what it does :)
Is this better? We might not need to unnormalize.
def draw_bounding_box(
tensor,
bbox,
fill=None,
outline=None,
width=1,
):
from PIL import Image, ImageDraw
# Add 0.5 after unnormalizing to [0, 255] to round to nearest integer
ndarr = tensor.mul(255).add_(0.5).clamp_(0, 255).permute(
1, 2, 0).to('cpu', torch.uint8).numpy()
im = Image.fromarray(ndarr)
ImageDraw.Draw(im).rectangle(bbox, fill=fill, outline=outline, width=width)
from numpy import array as to_numpy_array
return torch.from_numpy(to_numpy_array(im))
fill parameter.outline to color to make it more expressive.PIL.ImageDraw.rectangle accepts the fill kwarg, why would we not want to use it? IMHO we don't need to limit the functionality of Pillow{detection_class: [bbox1, bbox2, ...]} seems the most intuitive way to deal with this. Alternatively, we could just make our bbox arg a sequence of bboxes, and tell users to call draw_bounding_box multiple times (1 call for each class). Why not?
As the name implies, fill will fill the inside of the rectangle and thus blocking everything within the bounding box from view. I can't think of a scenario where someone wants that.
Why not?
As the name implies,
fillwill fill the inside of the rectangle and thus blocking everything within the bounding box from view. I can't think of a scenario where someone wants that.
In my case, I want to be able to fill in bounding boxes because I want to be able to treat bounding boxes as segmentation masks. That is, by filling in boxes onto a black background, I can use cv2.findContours to get the bounding boxes, and locate their centers. I will admit three things about this use-case:
torch.from_numpy/torch.tensorLooking at these three points against this use-case, I suppose we can remove fill (mostly due to the last point). Is this a little better? In this definition, the bboxes arg is a sequence of boxes since we haven't decided on that yet.
(off the top of my head; untested)
def draw_bounding_box(
tensor,
bboxes,
color=None,
width=1,
):
from PIL import Image, ImageDraw
# Add 0.5 after unnormalizing to [0, 255] to round to nearest integer
ndarr = tensor.mul(255).add_(0.5).clamp_(0, 255).permute(
1, 2, 0).to('cpu', torch.uint8).numpy()
im = Image.fromarray(ndarr)
draw = ImageDraw.Draw(im)
if len(color) == 3 and isinstance(color[0], int):
# bboxes is a sequence and color is a single (R, G, B) color
# TODO: check if Pillow accepts any other formats for the color kwarg
# TODO: this if-statement looks ugly
colors = [color]*len(bboxes)
else:
# bboxes and color are both sequences of the same len
colors = color
for bbox, color in zip(bboxes, colors):
draw.rectangle(bbox, outline=color, width=width)
from numpy import array as to_numpy_array
return torch.from_numpy(to_numpy_array(im))
Is this a little better?
Yeah, I think your use-case is an edge-case that justifies a custom implementation on the user side.
Two more remarks:
draw_bounding_box(image, bbox, color=None, width=None) that draws only a single bbox with the given color. In addition draw_bounding_boxes(...) could handle multiple bboxes per class and later the class tags. Thoughts?draw_bounding_boxes and passing in a list of a single bbox. One thing to discuss about palettes is how we want to decide colors when a user wants to draw more than 10 classes. I'm tempted to make draw_bounding_boxes only accept one color, and force users to call it 1 time per class, but obviously that's not ideal. A (not great) solution would be to assert (or raise an Error) that the user only draw 10 classes at once, and suggest they use custom palettes.
You are right, I didn't think this through properly. So we basically have two cases:
str key and a list of bounding boxes as value. Here we treat the sequence as one class and thus of the same color.Do we agree on that?
The color palettes I proposed should only act as a sensible default. If a user needs special colors, he will need to define them manually. In the spirit of 1., I think the color kwarg should also either be a sequence or a dictionary. Thoughts?
A (not great) solution would be to assert (or raise an Error) that the user only draw 10 classes at once, and suggest they use custom palettes.
Why not? It sounds pretty sensible to me to raise an error if our defaults are not sufficient for the use case. Do you think users will hit this case often?
Do we agree on that?
Yup!
A (not great) solution would be to assert (or raise an Error) that the user only draw 10 classes at once, and suggest they use custom palettes.
Why not? It sounds pretty sensible to me to raise an error if our defaults are not sufficient for the use case. Do you think users will hit this case often?
I'm not a fan of this idea because it feels like 10 is an arbitrary length of the palette (even though it's really not). I think these are our options:
What do you think? I'm fine with raising an error.
I'm also wondering if we should scrap the color kwarg in favor of the dict of classes. IMO this makes complete sense: {object_class: {bboxes: [bbox1, bbox2...], color: rgb_color}}. The only downside is that it's not really intuitive, as one wouldn't expect such a complex structure to represent a bunch of bounding boxes.
What do you think? I'm fine with raising an error.
From what we have now, I think this is the best option:
Randomly generating colors might result in colors that are practically indistinguishable and thus having the same problem as above.
The color palette given in my second link has up to 12 colors. I think if someone really needs more, this is out of scope for us. I even think having 12 classes of bboxes at the same time is confusing enough that most people won't go for that.
I'm also wondering if we should scrap the color kwarg in favor of the dict of classes. [...]
What do you think? I'm fine with raising an error.
The color palette given in my second link has up to 12 colors. I think if someone really needs more, this is out of scope for us. I even think having 12 classes of bboxes at the same time is confusing enough that most people won't go for that.
That's a valid point; let's go with raising an error if the number of classes exceeds the length of the palette and there are no custom colors.
I'm also wondering if we should scrap the color kwarg in favor of the dict of classes. [...]
Now we are stuck at manually defining the colors again. IMO color should have a sensible default, since most (I believe > 99% of people) don't care about the colors if they can be easily distinguished from another.
I agree with your statement about most people not needing custom colors. However, I'm unsure why the dict-with-colors forces** us back into manually defining colors. If a user provides colors in the dict, then hooray, we use those. Otherwise, we simply choose the classes' colors based on the palettes you suggested earlier. Unfortunately, dicts don't preserve order, but we can simply draw classes in alphabetical order (so we have deterministic color-class combos) or use OrderedDict.
To me, using a dict for classes and bboxes but not colors doesn't totally make sense. If we want to separate arguments, I think we should have 3 args: classes, bboxes, and colors with assert len(classes) == len(bboxes) and assert len(colors) <= len(classes) if colors is not None (the asserts are pseudocode鈥攏ot copy-pasteable).
The problem I see is that IMO most people will not touch the colors. Thus, if we go for your approach
{object_class: {bboxes: [bbox1, bbox2...], color: rgb_color}}
you are forcing the user to handle the colors manually. If we go for the bbox signature we previously agreed on, the user has not to fiddle with the colors at all.
I think its more convenient to pass
bboxes = {"class0": (bbox0, bbox1), "class1": (bbox2,)}
and maybe having to specify
colors = {"class0": "red", "class1": "blue"}
instead of always passing
colors = {"class0": {"bboxes": (bbox0, bbox1), "color": "red"}, "class1": {"bboxes": (bbox2,), "color": "blue"}}
That makes sense. I think we should go with a dict for colors then, because dicts don't preserve order so we can't exactly "align" a dict of {class: bbox} with a list of colors. I think we've agreed on a signature that looks something like this:
def draw_bounding_boxes(tensor, bboxes, colors=None, width=1):
"""Draws the bounding boxes onto an image represented as a torch tensor.
tensor: torch.Tensor
bboxes: dict of object_class: list_of_bboxes_for_class
colors: dict of object_class: color_of_bboxes_for_class, optional
"""
pass
The docstring is pretty bad, but @pmeier does this signature look okay?
Don't forget the option to pass the bboxes as a sequence.
BBox = Tuple[int, int, int, int]
Color = Tuple[int, int, int]
DEFAULT_COLORS: Sequence[Color]
def draw_bounding_boxes(
image: torch.Tensor,
bboxes: Union[Sequence[BBox], Dict[str, Sequence[BBox]]],
colors: Optional[Dict[str, Color]] = None,
width: int = 1,
) -> torch.Tensor:
pass
I think the default behavior of colors should be
bboxes is a sequence align it with DEFAULT_COLORSbboxes is a dict, sort it alphabetically along the classes (as you proposed earlier) and align this with DEFAULT_COLORSI like that, but I think we should add an if-statement or assert to make sure users don't pass in a sequence for bboxes and a custom dict for colors to avoid any potential confusion. Unfortunately, I don't know if torch.jit will be able to treat this if-statement as part of the type-definition when it's type-checking.
Since we agree on the signature lets move on to the PR for now! If we hit any blocks down the road, lets solve them there. Just ping me when you are ready.
Great! I'm actually very busy until August 29, but I'll definitely try to get a draft PR in after then. Ping me if I don't do it by September 3!
Oh鈥攕houldn't we have a bool kwarg for whether class labels should be drawn on the bbox borders (see the image I sent earlier)? That would make the signature:
BBox = Tuple[int, int, int, int]
Color = Tuple[int, int, int]
DEFAULT_COLORS: Sequence[Color]
def draw_bounding_boxes(
image: torch.Tensor,
bboxes: Union[Sequence[BBox], Dict[str, Sequence[BBox]]],
colors: Optional[Dict[str, Color]] = None,
draw_labels: bool = False,
width: int = 1,
) -> torch.Tensor:
pass
I think that is a fair point although I would go with None is default and than set it to True or False depending on the type of bboxes. Similar to colors, we should raise an error if bboxes is a sequence and draw_labels is True.
Hey @pmeier and @sumanthratna
I have slightly different thoughts towards the API of bounding box implementations.
Allow user to pass Bboxes which are outputs of our detection models. This will make it easier to plot and process, as most probably people are using this function for detection models in torchvision
.
BBoxes can hence be a tensor, thus this utility will come in hand when we are using detection models.
Also then the user can probably use the recently added box_convert to convert boxes and pass if needed. I don't think we should do it internally. Tensor[N, 4] hence makes it quite simple and probably operations can be better.
This stays consistent with other utilities such as make_grid since they use Tensors. It would be odd to have this utlitiy as list / sequence. Also, I really don't think that the user might be using list/tuple as bounding boxes.
I'm slightly unsure for labels I think if we can again re-use detection models labels it would be nicer.
I think if again user passes the labels from torchvision output. We can probably assign the colors easily. Having a fixed color picker for a given color. If colors Dict for labels in not passed.
Hence colors can be
colors = {1: "red', 2: "blue"} # Note that 0 is background for torchvision models and cannot be passed.
Hence I guess the API can be
def draw_bounding_boxes(
image: torch.Tensor,
bboxes: Tensor[N, 4],
labels: Tensor[N],
colors: Dict[int, str]] = None,
draw_labels: bool = False,
width: int = 1,
) -> torch.Tensor:
pass
I understand the point of the older API design. But this new one makes this utility useful. As this is a utility to use with torchvision, it should rather be useful to torchvision API. These are just small tweaks to take tensors instead of lists. Accept labels and keep colors to (int, str) dict to make it easier.
I'm still thinking about the text. Maybe allow the user to pass dict() of class label mapping? If passed we can draw_labels (as in put text as well).
This API is much simpler to use and makes it really intuitive to pass outputs of detection models itself.
@oke-aditya Very good points! We didn't consider the detection API before, but I think you are right that draw_bounding_boxes should be compatible. You can go ahead and implement this.
Great Then, I probably need a fresh PR for this. Since I have to rewrite the logic :-) I will cite @sumanthratna in the code
Most helpful comment
Great Then, I probably need a fresh PR for this. Since I have to rewrite the logic :-) I will cite @sumanthratna in the code