Rectangular inference is implemented by default in detect.py. This reduces inference time proportionally to the amount of letterboxed area padded onto a square image vs a 32-minimum multiple rectangular image. On zidane.jpg, for example, CPU inference time (on a 2018 MacBook Pro) reduces from 1.01s to 0.63s, a 37% reduction, corresponding to a 38% reduction in image area (416x416 to 256x416).
Letterboxes to 416x416 squares.
python3 detect.py # 416 square inference
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')
Using CPU
image 1/2 data/samples/bus.jpg: 416x416 1 handbags, 3 persons, 1 buss, Done. (0.999s)
image 2/2 data/samples/zidane.jpg: 416x416 1 ties, 2 persons, Done. (1.008s)
Letterboxes to 416 along longest image dimension, pads shorter dimension to minimum multiple of 32.
python3 detect.py # 416 rectangular inference
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')
Using CPU
image 1/2 data/samples/bus.jpg: 416x320 1 handbags, 3 persons, 1 buss, Done. (0.767s)
image 2/2 data/samples/zidane.jpg: 256x416 1 ties, 2 persons, Done. (0.632s)
zidane.jpg | bus.jpg
--- | ---
416x416
|416x416
256x416
|416x320
1280โรโ720
|810โรโ1080
Rectangular training example in the works, first batch of COCO. This is a bit complicated as we need to letterbox all images in the batch to the same size, and some of the images are being pulled simultaneously by parallel dataloader workers. So part of this process is determining apriori the batch index that each image belongs to (shuffle=False now), and then letterboxing it to the minimum viable 32 mulitple for the most square image in that batch. This should be included in our upcoming v7 release, with enormous training speed improvements (about 1/3 faster on mixed aspect ratio datasets like COCO).

Rectangular training results on coco_100img.data. Speedup was not material in this case because CUDA was constantly optimizing at each batch due to benchmark set to True, and the dataset of 100 images had only 6 batches, each with a different shape. Speedup should be more impactful on larger training sets. Individual batches were timed as fast as 0.189 seconds here vs 0.240 seconds for 416 square training using a V100.
https://github.com/ultralytics/yolov3/blob/7e6e1897ac5514ab0e2ae3e3357da8a9c744cfed/train.py#L64

Rectangular training can be accessed here:
https://github.com/ultralytics/yolov3/blob/7e6e1897ac5514ab0e2ae3e3357da8a9c744cfed/utils/datasets.py#L146-L148
Rectangular inference is now working in our latest iDetection iOS App build! This is a screenshot recorded today at 192x320, inference on vertical 4k format 16:9 aspect ratio iPhone video. This pushes the performance to realtime 30 FPS!! This means that we now have YOLOv3-SPP running in realtime on an iPhone Xs using rectangular inference! This is a worldwide first as far as we know.

Hi @glenn-jocher,
I'm trying rectangular training with rect=True but the tensors during training are all square starting with the input torch.Size([16, 3, 416, 416]), what could be the problem?
I'd expect the shapes to be the nearest multiples of 32 for both image dimensions.
What should be img_size in the line:
self.batch_shapes = np.ceil(np.array(shapes) * img_size / 32.).astype(np.int) * 32
I also noticed that images look rectangular in the test_batch.jpg but square in train_batch.jpg, does this mean that rectangular training is unsupported?
@MOHAMEDELDAKDOUKY training uses a mosaic loader, which loads 4 images at a time into a mosaic. You can disable this on this line:
https://github.com/ultralytics/yolov3/blob/e27b124828642198581512d42b14f0afe181ecd5/utils/datasets.py#L408
@MOHAMEDELDAKDOUKY training uses a mosaic loader, which loads 4 images at a time into a mosaic. You can disable this on this line:
https://github.com/ultralytics/yolov3/blob/e27b124828642198581512d42b14f0afe181ecd5/utils/datasets.py#L408
Yes, I disabled it but the images are still squares of 416x416.
@MOHAMEDELDAKDOUKY your repo may be out of date. git clone a new version and try again.
@MOHAMEDELDAKDOUKY your repo may be out of date. git clone a new version and try again.
Done, still getting this withrect=True, mosaic and augmentation disabled.

@MOHAMEDELDAKDOUKY test.py's dataloader operates rectangular inference. Use the same settings in train.py.
@MOHAMEDELDAKDOUKY test.py's dataloader operates rectangular inference. Use the same settings in train.py.
Well, the issue was that I used a batch_size=16 which is the whole 16 images set. Images were padded to the same size of the clock image in the third row which is square.
Thanks for your reply!
@MOHAMEDELDAKDOUKY ah of course. The batch is padded to the minimum rectangle of the entire group of images, so one square image may cause the batch to be square. Rectangular dataloading is also always in the same order, as the images are loaded in increasing aspect ratio.
It seems that letterbox has computed ratio and padding, and scale_coords compute them again. Will it speed up if compute one time?
https://github.com/ultralytics/yolov3/blob/5d73b190b053e8c3b87efb7fb1adc48496a04d01/utils/utils.py#L149
@mozpp yes, the intention is that if ratio_pad is not passed to the function then the padding is computed automatically based on the same assumptions set forth when padding the image originally. Some speedup might be realized by passing this precomputed value, but in profiling this is not a significant hotspot.
If I use this project to convert yolov3 or yolo-spp models to onnx, does the transferred onnx support rectangular inference?
@glenn-jocher Waiting for your early reply!
@chouxianyu yes. iDetection on iOS runs with rectangular inference using the PyTorch > ONNX > CoreML export pipeline.
Rectangular inference is implemented by default in
detect.py. This reduces inference time proportionally to the amount of letterboxed area padded onto a square image vs a 32-minimum multiple rectangular image. Onzidane.jpg, for example, CPU inference time (on a 2018 MacBook Pro) reduces from 1.01s to 0.63s, a 37% reduction, corresponding to a 38% reduction in image area (416x416 to 256x416).Square Inference
Letterboxes to 416x416 squares.
python3 detect.py # 416 square inference Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights') Using CPU image 1/2 data/samples/bus.jpg: 416x416 1 handbags, 3 persons, 1 buss, Done. (0.999s) image 2/2 data/samples/zidane.jpg: 416x416 1 ties, 2 persons, Done. (1.008s)Rectangular Inference
Letterboxes to 416 along longest image dimension, pads shorter dimension to minimum multiple of 32.
python3 detect.py # 416 rectangular inference Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights') Using CPU image 1/2 data/samples/bus.jpg: 416x320 1 handbags, 3 persons, 1 buss, Done. (0.767s) image 2/2 data/samples/zidane.jpg: 256x416 1 ties, 2 persons, Done. (0.632s)
zidane.jpgbus.jpg
416x416416x416
256x416416x320
1280โรโ720810โรโ1080
def letterbox(img, new_shape=(416, 416), color=(114, 114, 114), auto=False, scaleFill=False, scaleup=True):
416*416 inference is worse than auto=True ?
@feixiangdekaka I don't understand your question.
hello,
I have a question. why use this color (114, 114, 114) to fill border rather then black (0, 0, 0)?
@WZMIAOMIAO imagenet mean.
@glenn-jocher Isn't the imagenet mean [123.68, 116.78, 103.94]?
@WZMIAOMIAO sure, sum those numbers and divide by 3. We use this because some functions prepopulate with a scalar rather than a vector.
sorry, i can't understand this meaning. what if i use [0, 0, 0] to fill border?
@WZMIAOMIAO use whatever you want.
@glenn-jocher It seems that if the new shape is same as original shape, the code will not reshape the img to multiple of 32, how to solve the problem?
@Edwardmark dataloading and letterboxing works correctly under all circumstances. If you believe otherwise please submit a full bug report using the bug report template.
@glenn-jocher https://github.com/ultralytics/yolov3/blob/54722d00bbe6139ed8bf1fa1b43f4a7f88e0b539/utils/datasets.py#L637
Why taking mod of 64? I think it should be 32 as we are padding to the minimum multiple of 32.
@menggui1993 yes 32 is all we need. Please submit a PR for this, thanks!
@glenn-jocher I've made a PR, please refer to #1524
@menggui1993 yes I've merged your PR. Thank you for your contributions.
Hello @glenn-jocher what if expand bbox following right or bottom? If I want to change behavior of input expand, where I should consider to change too. (Based on yolov5). Thank you
@ledinhtri97 I don't understand what you are asking.
@glenn-jocher I mean the image input could change the ways of padding? image bellow I padding calculate scale to maintain aspect ratio then add padding to bottom side.

@ledinhtri97 padding and letterboxing is handled in utils/dataloader.py, you can make any modifications you're seeking there.
Rectangular inference is implemented by default in
detect.py. This reduces inference time proportionally to the amount of letterboxed area padded onto a square image vs a 32-minimum multiple rectangular image. Onzidane.jpg, for example, CPU inference time (on a 2018 MacBook Pro) reduces from 1.01s to 0.63s, a 37% reduction, corresponding to a 38% reduction in image area (416x416 to 256x416).Square Inference
Letterboxes to 416x416 squares.
python3 detect.py # 416 square inference Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights') Using CPU image 1/2 data/samples/bus.jpg: 416x416 1 handbags, 3 persons, 1 buss, Done. (0.999s) image 2/2 data/samples/zidane.jpg: 416x416 1 ties, 2 persons, Done. (1.008s)Rectangular Inference
Letterboxes to 416 along longest image dimension, pads shorter dimension to minimum multiple of 32.
python3 detect.py # 416 rectangular inference Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights') Using CPU image 1/2 data/samples/bus.jpg: 416x320 1 handbags, 3 persons, 1 buss, Done. (0.767s) image 2/2 data/samples/zidane.jpg: 256x416 1 ties, 2 persons, Done. (0.632s)
zidane.jpgbus.jpg
416x416416x416
256x416416x320
1280โรโ720810โรโ1080
Hi, @glenn-jocher , i find the command line of Square Inference and Rectangular Inference are the same as :
python3 detect.py
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')
So where is the difference behind the scene ?
Rectangular training example in the works, first batch of COCO. This is a bit complicated as we need to letterbox all images in the batch to the same size, and some of the images are being pulled simultaneously by parallel dataloader workers. So part of this process is determining apriori the batch index that each image belongs to (
shuffle=Falsenow), and then letterboxing it to the minimum viable 32 mulitple for the most square image in that batch. This should be included in our upcoming v7 release, with enormous training speed improvements (about 1/3 faster on mixed aspect ratio datasets like COCO).
Hi, @glenn-jocher , you says letterboxing it to the minimum viable 32 mulitple for the most square image in that batch, if different batchs have different image size of the most square image, does it mean one batch one image size, different batch has different image size ?
@wwdok yes of course, rectangular inference implies each batch may have a different shape.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
Rectangular inference is now working in our latest iDetection iOS App build! This is a screenshot recorded today at 192x320, inference on vertical 4k format 16:9 aspect ratio iPhone video. This pushes the performance to realtime 30 FPS!! This means that we now have YOLOv3-SPP running in realtime on an iPhone Xs using rectangular inference! This is a worldwide first as far as we know.
