Yolov3: Multi-scale incompatibility with Rectangular Training

Created on 3 Jul 2019  路  6Comments  路  Source: ultralytics/yolov3

Describe the bug
I used 1024*288 images as a training image, using four 1080ti gpus. I tried to enable both rectangular_training and multi-scale_learning but tensor size mismatch occurred on route modules. The error seems to occur because the input image doesn't have multiples of 32.
It seems the images are resized and padded to have multiples of 32, but they are changed by resizing part of multi-scale module. Switching the location of the two modules may fix the problem. Or using estimated width and height to scale the image instead of using resizing factor would help too.

bug

All 6 comments

@cmiller2air lets see. Rectangular training attempts to sort all of your training images by aspect ratio, and then groups similar aspect ratio images into a single batch. It will then pad all of the images in the batch as necessary to achieve a minimum pad given the constraint that the image dimensions must be multiples of 32.

I'm not sure if rectangular training and multi scale are mutually exclusive, though one important problem with rectangular training is that you need to be sure that the images are not shuffled by the data loader.

test.py uses rectangular images by default in this repo, so the rectangular functionality is operating properly by itself. Can you post your command and the screen outputs of the command?

I took a look at this over here. It seems like the interpolation operation in train.py is resizing the long size to a new multiple of 32, though the new image is no longer subject to the 32-multiple constraint on the shorter side. This is what is causing the errors.

https://github.com/ultralytics/yolov3/blob/ab141fcc1ff976fa9d1bd983e11fe43ba4628e2e/train.py#L199-L206

This will need some significant additional logic to correct. We will leave this issue open. I can't give you a timeline for a fix, but an immediate workaround is to use rectangular training with --single-scale.

I have experienced the same problem. I cannot use --rect flag together with --multi-scale when I try to do this I got belowing error:


Traceback (most recent call last):
  File "train.py", line 334, in <module>
    accumulate=opt.accumulate)
  File "train.py", line 207, in train
    pred = model(imgs)
  File "/home/tomekb/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/tomekb/pytorch_yolov3/models.py", line 189, in forward
    x = torch.cat([layer_outputs[i] for i in layer_i], 1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 27 and 28 in dimension 2 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:71

@Bienqq @cmiller2air since we now had multiple requests we elevated the issue status and have now implemented a fix in https://github.com/ultralytics/yolov3/commit/44b340321fef16ee21e9fba43caa7ddebef03c2f. Multiscale training should now be compatible with rectangular training.

Please git pull and try again. Thank you!

Everything appears to be operating correctly in our COCO tests:

python3 train.py --data data/coco.data --img-size 320 --rect --multi-scale

Namespace(accumulate=4, batch_size=16, bucket='', cfg='cfg/yolov3-spp.cfg', data='data/coco.data', epochs=100, evolve=False, img_size=320, multi_scale=True, nosave=False, notest=False, num_workers=4, rect=True, resume=False, transfer=False, var=0, xywh=False)
Using CUDA with Apex device0 _CudaDeviceProperties(name='Tesla T4', total_memory=15079MB)

Reading image shapes: 100% 117263/117263 [03:24<00:00, 572.30it/s]
Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients

     Epoch   gpu_mem   GIoU/xy        wh       obj       cls     total   targets  img_size
      0/99      3.7G     0.565         0      10.6      13.2      24.4        93       352:   4% 297/7329 [01:55<33:54,  3.46it/s]

Closing as issue should be resolved.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

JungminChung picture JungminChung  路  3Comments

Sibozhu picture Sibozhu  路  4Comments

MichaelCong picture MichaelCong  路  4Comments

cyberclone12 picture cyberclone12  路  4Comments

Rajasekhar06 picture Rajasekhar06  路  3Comments