Vision: Deformable convolution fails when batch size is more than 32?

Created on 10 Mar 2020  路  11Comments  路  Source: pytorch/vision

馃悰 Bug


Deformable convolution fails when batch size is more than 32. To check if this is a regular behavior, it was tested against the implementation provided in mmdetection. That implementation works for any batch size.


Trace:

Traceback (most recent call last):

  File "/mnt/1260D2E260D2CC1B/Nutrition_Pytorch/Classification/debug/deform_test.py", line 42, in <module>
    output = model(img)

  File "/opt/anaconda/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)

  File "/mnt/1260D2E260D2CC1B/Nutrition_Pytorch/Classification/debug/deform_test.py", line 32, in forward
    return deform_conv2d(input = x, offset= offset, weight= self.weight, stride=_pair(self.stride),padding= _pair(self.padding), dilation=_pair(self.dilation))

  File "/opt/anaconda/anaconda3/lib/python3.7/site-packages/torchvision/ops/deform_conv.py", line 76, in deform_conv2d
    n_offset_grps)

RuntimeError: shape '[1, 1, 288]' is invalid for input of size 158957568

Expected behavior

import torch
from torchvision.ops import DeformConv2d
import torch.nn as nn
from torch.nn.modules.utils import _pair
from torchvision.ops import deform_conv2d

class DeformConvPack(DeformConv2d):
    def __init__(self, *args, **kwargs):
        super(DeformConvPack, self).__init__(*args, **kwargs)

        self.conv_offset = nn.Conv2d(
            self.in_channels,
            2 * 2 * self.kernel_size[0] * self.kernel_size[1],
            kernel_size=self.kernel_size,
            stride=_pair(self.stride),
            padding=_pair(self.padding),
            bias=True)
        self.init_offset()

    def init_offset(self):
        self.conv_offset.weight.data.zero_()
        self.conv_offset.bias.data.zero_()

    def forward(self, x):
        offset = self.conv_offset(x)
        return deform_conv2d(input = x, offset= offset, weight= self.weight, stride=_pair(self.stride),padding= _pair(self.padding), dilation=_pair(self.dilation))



if __name__ == '__main__':
    img = torch.randn(32, 32, 224, 224).cuda()
    model = DeformConvPack(32 ,64,3,padding =1).cuda()
    output = model(img)

Environment Information:

PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 19.04
GCC version: (Ubuntu 8.3.0-6ubuntu1) 8.3.0
CMake version: version 3.13.4

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.105
GPU models and configuration: GPU 0: GeForce RTX 2080 Ti
Nvidia driver version: 440.26
cuDNN version: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.2

Versions of relevant libraries:
[pip] numpy==1.18.1
[pip] numpydoc==0.9.2
[pip] torch==1.4.0
[pip] torchtext==0.5.0
[pip] torchvision==0.5.0
[conda] blas                      1.0                         mkl  
[conda] mkl                       2019.4                      243  
[conda] mkl-service               2.3.0            py37he904b0f_0  
[conda] mkl_fft                   1.0.15           py37ha843d7b_0  
[conda] mkl_random                1.1.0            py37hd6b4f25_0  
[conda] pytorch                   1.4.0           py3.7_cuda10.1.243_cudnn7.6.3_0    pytorch
[conda] torchtext                 0.5.0                    pypi_0    pypi
[conda] torchvision               0.5.0                py37_cu101    pytorch

cc @ngimel @fmassa @vfdev-5

ops

Most helpful comment

This is a problem with a torchvision operator, right? It might help to cross-post in the torchvision repo: https://github.com/pytorch/vision/

All 11 comments

Most likely bad 32-bit indexing in our kernel?

This is a problem with a torchvision operator, right? It might help to cross-post in the torchvision repo: https://github.com/pytorch/vision/

CPU module crashed after 32 batch size too.
Windows or Linux OS doesn't matter, both 64bit os

@zou3519

can you elaborate more on the vision operator?

Hi guys, I have met similar problems. After some experiments, I am pretty sure the problem is caused by im2col_step setting and not related to 32bit-index as mentioned by @albanD . Specifically, by manually dynamically setting im2col_step to be the batchsize, the code actually works. My guess is that the number 32 is exactly the default setting of im2col_step @shreyaskamathkm so that is why you only experienced a batch size 32 problem.

I could reproduce the issue on pytorch==1.4.0 and torchvision==0.5.0 if I change above code taking the batch size = 48: img = torch.randn(48, 32, 224, 224).cuda() and there is no issue with pytorch==1.5.0 and torchvision==0.6.0 and later version up to current nightlies : pytorch=1.8.0.dev20201011 torchvision=0.8.0a0.

@shreyaskamathkm could you please confirm that even if upgrading pytorch and torchvision you still has the issue ?

Hi @vfdev-5 ,
By setting

img = torch.randn(24, 256, 128, 128).cuda()
model = DeformConvPack(256, 256, 3, padding =1).cuda()

would occur RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)

Tested on pytorch==1.6.0, torchvision==0.7.0

@datumbox could you please reproduce this : https://github.com/pytorch/vision/issues/2817#issuecomment-708593829 and check on master ? Most probably, it was already fixed in master...

We have fixed DeformableConv2d support for batches larger than 32 in https://github.com/pytorch/vision/pull/2040 , and the fix is present in torchvision 0.7.0

For @happywu message, please open a new issue with a self-contained minimal example that reproduces the error. You might just be facing memory issues while trying to allocate that tensor.

@vfdev-5 My laptop's GPU can't fit all the data. Nevertheless the following runs fine against 0.7.0:

#...
img = torch.randn(32, 32, 112, 112).cuda()
#...

@datumbox thanks a lot !

Was this page helpful?
0 / 5 - 0 ratings