Yolov3: RECTANGULAR INFERENCE

Created on 22 Apr 2019 · 37Comments · Source: ultralytics/yolov3

Rectangular inference is implemented by default in detect.py. This reduces inference time proportionally to the amount of letterboxed area padded onto a square image vs a 32-minimum multiple rectangular image. On zidane.jpg, for example, CPU inference time (on a 2018 MacBook Pro) reduces from 1.01s to 0.63s, a 37% reduction, corresponding to a 38% reduction in image area (416x416 to 256x416).

Square Inference

Letterboxes to 416x416 squares.

python3 detect.py  # 416 square inference
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')
Using CPU
image 1/2 data/samples/bus.jpg: 416x416 1 handbags, 3 persons, 1 buss, Done. (0.999s)
image 2/2 data/samples/zidane.jpg: 416x416 1 ties, 2 persons, Done. (1.008s)

Rectangular Inference

Letterboxes to 416 along longest image dimension, pads shorter dimension to minimum multiple of 32.

python3 detect.py  # 416 rectangular inference
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')
Using CPU
image 1/2 data/samples/bus.jpg: 416x320 1 handbags, 3 persons, 1 buss, Done. (0.767s)
image 2/2 data/samples/zidane.jpg: 256x416 1 ties, 2 persons, Done. (0.632s)

zidane.jpg | bus.jpg
--- | ---
416x416|416x416
256x416|416x320
1280 × 720|810 × 1080

Stale tutorial

Source

glenn-jocher

👍15 ❤8 😄2 👀1

Most helpful comment

Rectangular inference is now working in our latest iDetection iOS App build! This is a screenshot recorded today at 192x320, inference on vertical 4k format 16:9 aspect ratio iPhone video. This pushes the performance to realtime 30 FPS!! This means that we now have YOLOv3-SPP running in realtime on an iPhone Xs using rectangular inference! This is a worldwide first as far as we know.

glenn-jocher on 29 Apr 2019

👍19 🎉3 🚀2

All 37 comments

Rectangular training example in the works, first batch of COCO. This is a bit complicated as we need to letterbox all images in the batch to the same size, and some of the images are being pulled simultaneously by parallel dataloader workers. So part of this process is determining apriori the batch index that each image belongs to (shuffle=False now), and then letterboxing it to the minimum viable 32 mulitple for the most square image in that batch. This should be included in our upcoming v7 release, with enormous training speed improvements (about 1/3 faster on mixed aspect ratio datasets like COCO).

train_batch0

glenn-jocher on 24 Apr 2019

👍9

Rectangular training results on coco_100img.data. Speedup was not material in this case because CUDA was constantly optimizing at each batch due to benchmark set to True, and the dataset of 100 images had only 6 batches, each with a different shape. Speedup should be more impactful on larger training sets. Individual batches were timed as fast as 0.189 seconds here vs 0.240 seconds for 416 square training using a V100.
https://github.com/ultralytics/yolov3/blob/7e6e1897ac5514ab0e2ae3e3357da8a9c744cfed/train.py#L64

results

Rectangular training can be accessed here:
https://github.com/ultralytics/yolov3/blob/7e6e1897ac5514ab0e2ae3e3357da8a9c744cfed/utils/datasets.py#L146-L148

glenn-jocher on 29 Apr 2019

👍19 🎉3 🚀2

Hi @glenn-jocher,

I'm trying rectangular training with rect=True but the tensors during training are all square starting with the input torch.Size([16, 3, 416, 416]), what could be the problem?

I'd expect the shapes to be the nearest multiples of 32 for both image dimensions.

What should be img_size in the line:
self.batch_shapes = np.ceil(np.array(shapes) * img_size / 32.).astype(np.int) * 32

I also noticed that images look rectangular in the test_batch.jpg but square in train_batch.jpg, does this mean that rectangular training is unsupported?

MOHAMEDELDAKDOUKY on 5 Dec 2019

@MOHAMEDELDAKDOUKY training uses a mosaic loader, which loads 4 images at a time into a mosaic. You can disable this on this line:
https://github.com/ultralytics/yolov3/blob/e27b124828642198581512d42b14f0afe181ecd5/utils/datasets.py#L408

glenn-jocher on 5 Dec 2019

@MOHAMEDELDAKDOUKY training uses a mosaic loader, which loads 4 images at a time into a mosaic. You can disable this on this line:
https://github.com/ultralytics/yolov3/blob/e27b124828642198581512d42b14f0afe181ecd5/utils/datasets.py#L408

Yes, I disabled it but the images are still squares of 416x416.

MOHAMEDELDAKDOUKY on 5 Dec 2019

@MOHAMEDELDAKDOUKY your repo may be out of date. git clone a new version and try again.

glenn-jocher on 5 Dec 2019

@MOHAMEDELDAKDOUKY your repo may be out of date. git clone a new version and try again.

Done, still getting this withrect=True, mosaic and augmentation disabled.
train_batch0

MOHAMEDELDAKDOUKY on 5 Dec 2019

@MOHAMEDELDAKDOUKY test.py's dataloader operates rectangular inference. Use the same settings in train.py.

glenn-jocher on 6 Dec 2019

@MOHAMEDELDAKDOUKY test.py's dataloader operates rectangular inference. Use the same settings in train.py.

Well, the issue was that I used a batch_size=16 which is the whole 16 images set. Images were padded to the same size of the clock image in the third row which is square.

Thanks for your reply!

MOHAMEDELDAKDOUKY on 7 Dec 2019

@MOHAMEDELDAKDOUKY ah of course. The batch is padded to the minimum rectangle of the entire group of images, so one square image may cause the batch to be square. Rectangular dataloading is also always in the same order, as the images are loaded in increasing aspect ratio.

glenn-jocher on 7 Dec 2019

It seems that letterbox has computed ratio and padding, and scale_coords compute them again. Will it speed up if compute one time?
https://github.com/ultralytics/yolov3/blob/5d73b190b053e8c3b87efb7fb1adc48496a04d01/utils/utils.py#L149

mozpp on 22 Jan 2020

@mozpp yes, the intention is that if ratio_pad is not passed to the function then the padding is computed automatically based on the same assumptions set forth when padding the image originally. Some speedup might be realized by passing this precomputed value, but in profiling this is not a significant hotspot.

glenn-jocher on 27 Jan 2020

If I use this project to convert yolov3 or yolo-spp models to onnx, does the transferred onnx support rectangular inference?
@glenn-jocher Waiting for your early reply!

chouxianyu on 11 Mar 2020

@chouxianyu yes. iDetection on iOS runs with rectangular inference using the PyTorch > ONNX > CoreML export pipeline.

glenn-jocher on 11 Mar 2020

Rectangular inference is implemented by default in detect.py. This reduces inference time proportionally to the amount of letterboxed area padded onto a square image vs a 32-minimum multiple rectangular image. On zidane.jpg, for example, CPU inference time (on a 2018 MacBook Pro) reduces from 1.01s to 0.63s, a 37% reduction, corresponding to a 38% reduction in image area (416x416 to 256x416).

Square Inference

Letterboxes to 416x416 squares.
python3 detect.py  # 416 square inference
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')
Using CPU
image 1/2 data/samples/bus.jpg: 416x416 1 handbags, 3 persons, 1 buss, Done. (0.999s)
image 2/2 data/samples/zidane.jpg: 416x416 1 ties, 2 persons, Done. (1.008s)
Rectangular Inference

Letterboxes to 416 along longest image dimension, pads shorter dimension to minimum multiple of 32.
python3 detect.py  # 416 rectangular inference
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')
Using CPU
image 1/2 data/samples/bus.jpg: 416x320 1 handbags, 3 persons, 1 buss, Done. (0.767s)
image 2/2 data/samples/zidane.jpg: 256x416 1 ties, 2 persons, Done. (0.632s)
zidane.jpg bus.jpg
416x416 416x416
256x416 416x320
1280 × 720 810 × 1080

def letterbox(img, new_shape=(416, 416), color=(114, 114, 114), auto=False, scaleFill=False, scaleup=True):

416*416 inference is worse than auto=True ?

feixiangdekaka on 23 May 2020

@feixiangdekaka I don't understand your question.

glenn-jocher on 23 May 2020

hello,
I have a question. why use this color (114, 114, 114) to fill border rather then black (0, 0, 0)?

WZMIAOMIAO on 25 Jul 2020

@WZMIAOMIAO imagenet mean.

glenn-jocher on 25 Jul 2020

@glenn-jocher Isn't the imagenet mean [123.68, 116.78, 103.94]?

WZMIAOMIAO on 25 Jul 2020

@WZMIAOMIAO sure, sum those numbers and divide by 3. We use this because some functions prepopulate with a scalar rather than a vector.

glenn-jocher on 25 Jul 2020

sorry, i can't understand this meaning. what if i use [0, 0, 0] to fill border?

WZMIAOMIAO on 25 Jul 2020

@WZMIAOMIAO use whatever you want.

glenn-jocher on 25 Jul 2020

😄2

@glenn-jocher It seems that if the new shape is same as original shape, the code will not reshape the img to multiple of 32, how to solve the problem?

Edwardmark on 22 Sep 2020

@Edwardmark dataloading and letterboxing works correctly under all circumstances. If you believe otherwise please submit a full bug report using the bug report template.

glenn-jocher on 22 Sep 2020

@glenn-jocher https://github.com/ultralytics/yolov3/blob/54722d00bbe6139ed8bf1fa1b43f4a7f88e0b539/utils/datasets.py#L637
Why taking mod of 64? I think it should be 32 as we are padding to the minimum multiple of 32.

menggui1993 on 16 Oct 2020

@menggui1993 yes 32 is all we need. Please submit a PR for this, thanks!

glenn-jocher on 16 Oct 2020

@glenn-jocher I've made a PR, please refer to #1524

menggui1993 on 19 Oct 2020

@menggui1993 yes I've merged your PR. Thank you for your contributions.

glenn-jocher on 19 Oct 2020

Hello @glenn-jocher what if expand bbox following right or bottom? If I want to change behavior of input expand, where I should consider to change too. (Based on yolov5). Thank you

ledinhtri97 on 26 Oct 2020

@ledinhtri97 I don't understand what you are asking.

glenn-jocher on 26 Oct 2020

@glenn-jocher I mean the image input could change the ways of padding? image bellow I padding calculate scale to maintain aspect ratio then add padding to bottom side.
pre_img

ledinhtri97 on 26 Oct 2020

@ledinhtri97 padding and letterboxing is handled in utils/dataloader.py, you can make any modifications you're seeking there.

glenn-jocher on 26 Oct 2020

👍1

Rectangular inference is implemented by default in detect.py. This reduces inference time proportionally to the amount of letterboxed area padded onto a square image vs a 32-minimum multiple rectangular image. On zidane.jpg, for example, CPU inference time (on a 2018 MacBook Pro) reduces from 1.01s to 0.63s, a 37% reduction, corresponding to a 38% reduction in image area (416x416 to 256x416).

Square Inference

Letterboxes to 416x416 squares.
python3 detect.py  # 416 square inference
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')
Using CPU
image 1/2 data/samples/bus.jpg: 416x416 1 handbags, 3 persons, 1 buss, Done. (0.999s)
image 2/2 data/samples/zidane.jpg: 416x416 1 ties, 2 persons, Done. (1.008s)
Rectangular Inference

Letterboxes to 416 along longest image dimension, pads shorter dimension to minimum multiple of 32.
python3 detect.py  # 416 rectangular inference
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')
Using CPU
image 1/2 data/samples/bus.jpg: 416x320 1 handbags, 3 persons, 1 buss, Done. (0.767s)
image 2/2 data/samples/zidane.jpg: 256x416 1 ties, 2 persons, Done. (0.632s)
zidane.jpg bus.jpg
416x416 416x416
256x416 416x320
1280 × 720 810 × 1080

Hi, @glenn-jocher , i find the command line of Square Inference and Rectangular Inference are the same as :

python3 detect.py 
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')

So where is the difference behind the scene ?

wwdok on 30 Oct 2020

Rectangular training example in the works, first batch of COCO. This is a bit complicated as we need to letterbox all images in the batch to the same size, and some of the images are being pulled simultaneously by parallel dataloader workers. So part of this process is determining apriori the batch index that each image belongs to (shuffle=False now), and then letterboxing it to the minimum viable 32 mulitple for the most square image in that batch. This should be included in our upcoming v7 release, with enormous training speed improvements (about 1/3 faster on mixed aspect ratio datasets like COCO).

Hi, @glenn-jocher , you says letterboxing it to the minimum viable 32 mulitple for the most square image in that batch, if different batchs have different image size of the most square image, does it mean one batch one image size, different batch has different image size ?

wwdok on 31 Oct 2020

@wwdok yes of course, rectangular inference implies each batch may have a different shape.

glenn-jocher on 31 Oct 2020

❤1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.