Cntk: Group convolution mismatch between 2.5.1 and 2.6-rc0

Created on 10 Jul 2018 · 6Comments · Source: microsoft/CNTK

I am experiencing strange behaviour with group convolutions with 2.6-rc0.

With 2.5.1 release, we see the correct behaviour:

>>> import cntk as C
>>> import numpy as np
>>> print C.__version__
2.5.1
>>> x = C.input_variable((16, 64, 64), dtype=np.float32)
>>> conv_map = C.parameter((16, 1, 3, 3), init=C.glorot_uniform())
>>> y = C.convolution(conv_map, x, groups=16)
>>> print y
Convolution(Tensor[16,64,64]) -> Tensor[16,64,64]
>>> print y.parameters
(Parameter('Parameter207', [], [16 x 1 x 3 x 3]),)

In particular, y correctly maps an input shaped 16x64x64 onto a 16x64x64 output with 16 separate 3x3 filters.

But with the 2.6 nightly:

>>> import cntk as C
>>> import numpy as np
>>> print C.__version__
2.6-rc0.dev20180709
>>> x = C.input_variable((16, 64, 64), dtype=np.float32)
>>> conv_map = C.parameter((16, 1, 3, 3), init=C.glorot_uniform())
>>> y = C.convolution(conv_map, x, groups=16)
>>> print y
Convolution(Tensor[16,64,64]) -> Tensor[256,64,64]
>>> print y.parameters
(Parameter('Parameter9', [], [16 x 1 x 3 x 3]),)

We see y now strangely produces an output of 256x64x64. Perhaps it now applies the 16x1x3x3 block of weights as a single kernel onto each of the input channels?

So, I wondered if perhaps the definition of convolution_map in C.convolution has changed. Maybe we should give it the shape of each of the individual group kernels, i.e. [1, 1, 3, 3].

But if we do this:

>>> y2 = C.convolution(C.parameter((1, 1, 3, 3), init=C.glorot_uniform()), x, groups=16)
>>> print y2
Convolution(Tensor[16,64,64]) -> Tensor[16,64,64]
>>> print y2.parameters
(Parameter('Parameter13', [], [1 x 1 x 3 x 3]),)

Then I think it is clear that y2's parameters have the wrong shape. We expect to learn a 16x1x3x3 block of weights for this convolution with 16 groups, not 1x1x3x3.

Is this change in behaviour from 2.5.1 to 2.6 expected? Or is this a bug?

Source

errollw

Most helpful comment

@errollw There's definitely some input validation missing here. The product of the number of input channels in the kernel and groups should be the number of input channels in the input X, which is not the case here. This should be caught in input validation. When I change number of input channels in the kernel to two, the node comes out all right.
>>> x = C.input_variable((16, 64, 64), dtype=np.float32)
>>> conv_map = C.parameter((16, 2, 3, 3), init=C.glorot_uniform())
>>> y = C.convolution(conv_map, x, groups=8, auto_padding=[False, True, True])
>>> print(y)
Convolution(Tensor[16,64,64]) -> Tensor[16,64,64]

We will fix the input validation.

spandantiwari on 17 Jul 2018

👍2

All 6 comments

In your first snippet you have groups=py16 instead of groups=16.

kit1980 on 10 Jul 2018

Thanks for noticing, it’s a typo in my post, the issue remains

errollw on 10 Jul 2018

Thanks for reporting! This is indeed unexpected. We are working on a fix. For the time being, you should be able to get the correct behavior by calling

>>> x = C.input_variable((16, 64, 64), dtype=np.float32)
>>> conv_map = C.parameter((16, 1, 3, 3), init=C.glorot_uniform())
>>> y = C.convolution(conv_map, x, groups=16,auto_padding=[False, True, True])

More details: For cntk.Convolution, auto_padding is default to [True] and this also applies toward the input channel axis. In 2.5.1 we implemented group convolution by slicing the input and compute individual convolution for each group. In your case each input slice would have channel size 1: a special case results in no padding. In 2.6 we are computing group convolution directly through cudnn, thus the padding issue emerge.

BowenBao on 10 Jul 2018

👍2

Thanks @BowenBao for the workaround, this does work for groups=16, and you can save it out as an ONNX file that loads correctly in Netron. However, in case anyone else is trying to use the workaround above, it seems it does not work in the general case:

>>> import cntk as C
>>> print(C.__version__)
2.6-rc0.dev20180710
>>> import numpy as np
>>> x = C.input_variable((16, 64, 64), dtype=np.float32)
>>> conv_map = C.parameter((16, 1, 3, 3), init=C.glorot_uniform())
>>> y = C.convolution(conv_map, x, groups=8, auto_padding=[False, True, True])
>>> print y
Convolution(Tensor[16,64,64]) -> Tensor[144,64,64]

Note the output erroneously has 144 channels.

errollw on 17 Jul 2018

We will fix the input validation.

spandantiwari on 17 Jul 2018

👍2

Hi, is there any reason not to allow group convolution in the C.layers.Convolution? I'll be happy to raise a PR to make the necessary changes for all the layer functions. :)