Incubator-mxnet: ConvTranspose2d giving incorrect output

Created on 8 Jun 2018  路  9Comments  路  Source: apache/incubator-mxnet

When applying ConvTranspose2d with stride and dilation, the output is incorrect.

Working through a simple example:

  • Input Shape: 1x3x3
  • Kernel size: 3x3
  • Padding: Same (i.e.2x2 in this case)
  • Stride: 2x2
  • Dilation: 2x2

Manual Calculation

incorrect_trans2

MXNet Output

We get a different output from MXNet, which I believe is incorrect. Output is offset at top and left sides, and then clusters on bottom and right sides.

...
conv = mx.gluon.nn.Conv2DTranspose(in_channels=1, channels=1,
                                   kernel_size=(3,3), padding=(2,2),
                                   strides=(2,2), dilation=(2,2))
....
[[[[ 0.  0.  0.  0.  0.]
   [ 0.  1.  0.  2.  3.]
   [ 0.  0.  0.  0.  0.]
   [ 0.  4.  0.  5.  6.]
   [ 0.  7.  0.  8.  9.]]]]
<NDArray 1x1x5x5 @cpu(0)>



md5-6407887033e86c639a26f0707cfedc13



...
conv = torch.nn.ConvTranspose2d(in_channels=1, out_channels=1,
                                kernel_size=(3,3), padding=(2,2),
                                stride=(2,2), dilation=(2,2))
...



md5-c84a5e7bc2517b0b548fdc09a57093b8



tensor([[[[ 1.,  0.,  2.,  0.,  3.],
          [ 0.,  0.,  0.,  0.,  0.],
          [ 4.,  0.,  5.,  0.,  6.],
          [ 0.,  0.,  0.,  0.,  0.],
          [ 7.,  0.,  8.,  0.,  9.]]]])



md5-eac656c0567b6f27cbe791bdb5b32fc8



(mxnet_p36) ubuntu@ip-172-31-68-231:~$ python diagnose.py
----------Python Info----------
Version      : 3.6.4
Compiler     : GCC 7.2.0
Build        : ('default', 'Jan 16 2018 18:10:19')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 10.0.1
Directory    : /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.2.0
Directory    : /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet
Commit Hash   : 297c64fd2ee404612aa3ecc880b940fb2538039c
----------System Info----------
Platform     : Linux-4.4.0-1057-aws-x86_64-with-debian-stretch-sid
system       : Linux
node         : ip-172-31-68-231
release      : 4.4.0-1057-aws
version      : #66-Ubuntu SMP Thu May 3 12:49:47 UTC 2018
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               2699.984
CPU max MHz:           3000.0000
CPU min MHz:           1200.0000
BogoMIPS:              4600.08
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-7
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single retpoline kaiser fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0040 sec, LOAD: 0.3429 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0385 sec, LOAD: 0.3615 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.4978 sec, LOAD: 0.3804 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0207 sec, LOAD: 0.1522 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0027 sec, LOAD: 0.0859 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0284 sec, LOAD: 0.0201 sec.



md5-26a6f0e947a4d3650881c52cdb807cf2



import mxnet as mx

data = mx.nd.array(((0,0,0),
                    (0,1,0),
                    (0,0,0)))
kernel = mx.nd.array(((1,2,3),
                      (4,5,6),
                      (7,8,9)))

data_batch = data.expand_dims(0).expand_dims(0)
weight = kernel.expand_dims(0).expand_dims(0)
# initialize and set weight
conv = mx.gluon.nn.Conv2DTranspose(in_channels=1, channels=1,
                                   kernel_size=(3,3), padding=(2,2),
                                   strides=(2,2), dilation=(2,2))
conv.initialize()
conv.weight.set_data(weight)
print(conv(data_batch))



md5-c84a5e7bc2517b0b548fdc09a57093b8



[[[[ 0.  0.  0.  0.  0.]
   [ 0.  1.  0.  2.  3.]
   [ 0.  0.  0.  0.  0.]
   [ 0.  4.  0.  5.  6.]
   [ 0.  7.  0.  8.  9.]]]]
<NDArray 1x1x5x5 @cpu(0)>



md5-3bdf7054c4f98b9f01fd1445ac9a55a0



import torch

data = torch.tensor(((0,0,0),
                     (0,1,0),
                     (0,0,0)), dtype=torch.float32)
kernel = torch.tensor(((1,2,3),
                       (4,5,6),
                       (7,8,9)), dtype=torch.float32)

data_batch = data.expand(1,1,-1,-1)
weight = kernel.expand(1,1,-1,-1)
conv = torch.nn.ConvTranspose2d(in_channels=1, out_channels=1,
                                kernel_size=(3,3), padding=(2,2),
                                stride=(2,2), dilation=(2,2))
conv.weight.data = weight
print(conv(data_batch).round())



md5-c84a5e7bc2517b0b548fdc09a57093b8



tensor([[[[ 1.,  0.,  2.,  0.,  3.],
          [ 0.,  0.,  0.,  0.,  0.],
          [ 4.,  0.,  5.,  0.,  6.],
          [ 0.,  0.,  0.,  0.,  0.],
          [ 7.,  0.,  8.,  0.,  9.]]]])
Bug Operator

Most helpful comment

Confirmed that the bug is in unpack_patch2col and pack_col2patch when dilation is specified. Replacing them with col2im and im2col resolved the bug. PR coming out soon.

All 9 comments

This does look like a bug. We should fix it with high priority

@piiswrong Agreed, would be very hard to pick this up with non-trivial examples.
And could cause serious downstream effects in networks with upsampling.

As the convtranspose is the backward of conv, It may also has a bug in the conv's backward with this settings.

I am looking into this.

@winstywang @wistone looping in original author and reviewer for dilate .
@tornadomeet @zheng-da

This happens only with stride and dilation combination. Would you guys want to give a try?
This problem has been there since mxnet=1.0.

import mxnet as mx

data = mx.nd.array(((0,0,0),
                    (0,1,0),
                    (0,0,0)))
kernel = mx.nd.array(((1,2,3),
                      (4,5,6),
                      (7,8,9)))

data_batch = data.expand_dims(0).expand_dims(0)
weight = kernel.expand_dims(0).expand_dims(0)
# initialize and set weight
conv = mx.gluon.nn.Conv2DTranspose(in_channels=1, channels=1,
                                   kernel_size=(3,3),
                                   strides=(2,2),
                                   dilation=(2,2))
conv.initialize()
conv.weight.set_data(weight)
print(conv(data_batch))

This is not happening with mxnet-mkl release. This is also fixed in latest source where we have MKL default.
But OSx and non avx512 supported architecture is still having the issue.

Finally got some bandwidth. Looking into this issue again.

Cannot find any issue at MXNet operator level. Possibly due to the mshadow library functionunpack_patch2col and pack_col2patch do not work correctly with dilation. Replacing them using the native MXNet function col2im and im2col

Confirmed that the bug is in unpack_patch2col and pack_col2patch when dilation is specified. Replacing them with col2im and im2col resolved the bug. PR coming out soon.

Was this page helpful?
0 / 5 - 0 ratings