Deformable convolution fails when batch size is more than 32. To check if this is a regular behavior, it was tested against the implementation provided in mmdetection. That implementation works for any batch size.
Trace:
Traceback (most recent call last):
File "/mnt/1260D2E260D2CC1B/Nutrition_Pytorch/Classification/debug/deform_test.py", line 42, in <module>
output = model(img)
File "/opt/anaconda/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/1260D2E260D2CC1B/Nutrition_Pytorch/Classification/debug/deform_test.py", line 32, in forward
return deform_conv2d(input = x, offset= offset, weight= self.weight, stride=_pair(self.stride),padding= _pair(self.padding), dilation=_pair(self.dilation))
File "/opt/anaconda/anaconda3/lib/python3.7/site-packages/torchvision/ops/deform_conv.py", line 76, in deform_conv2d
n_offset_grps)
RuntimeError: shape '[1, 1, 288]' is invalid for input of size 158957568
import torch
from torchvision.ops import DeformConv2d
import torch.nn as nn
from torch.nn.modules.utils import _pair
from torchvision.ops import deform_conv2d
class DeformConvPack(DeformConv2d):
def __init__(self, *args, **kwargs):
super(DeformConvPack, self).__init__(*args, **kwargs)
self.conv_offset = nn.Conv2d(
self.in_channels,
2 * 2 * self.kernel_size[0] * self.kernel_size[1],
kernel_size=self.kernel_size,
stride=_pair(self.stride),
padding=_pair(self.padding),
bias=True)
self.init_offset()
def init_offset(self):
self.conv_offset.weight.data.zero_()
self.conv_offset.bias.data.zero_()
def forward(self, x):
offset = self.conv_offset(x)
return deform_conv2d(input = x, offset= offset, weight= self.weight, stride=_pair(self.stride),padding= _pair(self.padding), dilation=_pair(self.dilation))
if __name__ == '__main__':
img = torch.randn(32, 32, 224, 224).cuda()
model = DeformConvPack(32 ,64,3,padding =1).cuda()
output = model(img)
PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Ubuntu 19.04
GCC version: (Ubuntu 8.3.0-6ubuntu1) 8.3.0
CMake version: version 3.13.4
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.105
GPU models and configuration: GPU 0: GeForce RTX 2080 Ti
Nvidia driver version: 440.26
cuDNN version: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.2
Versions of relevant libraries:
[pip] numpy==1.18.1
[pip] numpydoc==0.9.2
[pip] torch==1.4.0
[pip] torchtext==0.5.0
[pip] torchvision==0.5.0
[conda] blas 1.0 mkl
[conda] mkl 2019.4 243
[conda] mkl-service 2.3.0 py37he904b0f_0
[conda] mkl_fft 1.0.15 py37ha843d7b_0
[conda] mkl_random 1.1.0 py37hd6b4f25_0
[conda] pytorch 1.4.0 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch
[conda] torchtext 0.5.0 pypi_0 pypi
[conda] torchvision 0.5.0 py37_cu101 pytorch
cc @ngimel @fmassa @vfdev-5
Most likely bad 32-bit indexing in our kernel?
This is a problem with a torchvision operator, right? It might help to cross-post in the torchvision repo: https://github.com/pytorch/vision/
CPU module crashed after 32 batch size too.
Windows or Linux OS doesn't matter, both 64bit os
@zou3519
can you elaborate more on the vision operator?
Hi guys, I have met similar problems. After some experiments, I am pretty sure the problem is caused by im2col_step setting and not related to 32bit-index as mentioned by @albanD . Specifically, by manually dynamically setting im2col_step to be the batchsize, the code actually works. My guess is that the number 32 is exactly the default setting of im2col_step @shreyaskamathkm so that is why you only experienced a batch size 32 problem.
I could reproduce the issue on pytorch==1.4.0 and torchvision==0.5.0 if I change above code taking the batch size = 48: img = torch.randn(48, 32, 224, 224).cuda() and there is no issue with pytorch==1.5.0 and torchvision==0.6.0 and later version up to current nightlies : pytorch=1.8.0.dev20201011 torchvision=0.8.0a0.
@shreyaskamathkm could you please confirm that even if upgrading pytorch and torchvision you still has the issue ?
Hi @vfdev-5 ,
By setting
img = torch.randn(24, 256, 128, 128).cuda()
model = DeformConvPack(256, 256, 3, padding =1).cuda()
would occur RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)
Tested on pytorch==1.6.0, torchvision==0.7.0
@datumbox could you please reproduce this : https://github.com/pytorch/vision/issues/2817#issuecomment-708593829 and check on master ? Most probably, it was already fixed in master...
We have fixed DeformableConv2d support for batches larger than 32 in https://github.com/pytorch/vision/pull/2040 , and the fix is present in torchvision 0.7.0
For @happywu message, please open a new issue with a self-contained minimal example that reproduces the error. You might just be facing memory issues while trying to allocate that tensor.
@vfdev-5 My laptop's GPU can't fit all the data. Nevertheless the following runs fine against 0.7.0:
#...
img = torch.randn(32, 32, 112, 112).cuda()
#...
@datumbox thanks a lot !
Most helpful comment
This is a problem with a torchvision operator, right? It might help to cross-post in the torchvision repo: https://github.com/pytorch/vision/