Keras: Zero-padding for ResNet shortcut connections when channel number increase

Created on 4 May 2016 · 12Comments · Source: keras-team/keras

Hello everybody,

I would like to implement the ResNet network with the shortcut connections that add zero entries when features/channels dimensions mismatch according to the original paper:

When the dimensions increase (dotted line shortcuts
in Fig. 3), we consider two options: (A) The shortcut still
performs identity mapping, with extra zero entries padded
for increasing dimensions ...
http://arxiv.org/pdf/1512.03385v1.pdf

However wasn't able to implement it and I can't seem to find an answer on the web or on the source code. All the implementations that I found use the 1x1 convolution trick for shortcut connections when dimensions mismatch.

The layer I would like to implement would basically concatenate the input tensor with a tensor with an all zeros tensor to compensate for the dimension mismatch.

The idea would be something like this, but I could not get it working:

def zero_pad(x, shape):
    return K.concatenate([x, K.zeros(shape)], axis=1)

Does anyone has an idea on how to implement such a layer ?

Thanks a lot

stale

Source

louismartin

Most helpful comment

If anyone wants to have zero-padding on depth and knows how many channels they want to get to, you can try:

def pad_depth(x, desired_channels):
    y = K.zeros_like(x, name='pad_depth1')
    new_channels = desired_channels - x.shape.as_list()[-1]
    y = y[:,:,:new_channels]
    return concatenate([x,y], name='pad_depth2')

If you have a layer that you're trying to add with, find out the number of channels in that layer by:

 desired_channels = some_layer.shape.as_list()[-1]

Then you can call the lambda function with:

earlier_layer_with_padding= Lambda(pad_depth, name='some_name', arguments={'desired_channels':desired_channels})(some_earlier_layer)

And finally, add earlier_layer_with_padding to some_layer via:

new_layer = add([earlier_layer_with_padding, some_layer], name='some_add')

make sure you from keras import backend as K. only tested with tensorflow backend

ghost on 8 Nov 2017

👍5 ❤1 🎉1

All 12 comments

Edit: I posted the issue as a comment sorry

louismartin on 4 May 2016

All the implementations that I found use the 1x1 convolution trick for
shortcut connections when dimensions mismatch.

They do this because it's what you should be doing. Residual connections
with different shapes should be handled via a learned linear transformation
between the two tensors, e.g. a 1x1 convolution with appropriate strides
and border_mode, or for Dense layers, just a matrix multiplication.

On 4 May 2016 at 07:42, LouisMartin [email protected] wrote:

Hello everybody,

I would like to implement the ResNet network with the shortcut connections
that add zero entries when features/channels dimensions mismatch according
to the original paper:

When the dimensions increase (dotted line shortcuts
in Fig. 3), we consider two options: (A) The shortcut still
performs identity mapping, with extra zero entries padded
for increasing dimensions ...
http://arxiv.org/pdf/1512.03385v1.pdf

However wasn't able to implement it and I can't seem to find an answer on
the web or on the source code. All the implementations that I found use the
1x1 convolution trick for shortcut connections when dimensions mismatch.

The layer I would like to implement would basically concatenate the input
tensor with a tensor with an all zeros tensor to compensate for the
dimension mismatch.

The idea would be something like this, but I could not get it working:

def zero_pad(x, shape):
return K.concatenate([x, K.zeros(shape)], axis=1)

Does anyone has an idea on how to implement such a layer ?

Thanks a lot

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2608#issuecomment-216886552

fchollet on 4 May 2016

Hi fchollet,

I am actually trying to implement the smallest possible network based on the SqueezeNet paper.
The problem is that these linear transformations add a lot of new parameters to my network which I cannot afford. The number of parameters goes from 800.000 to 1.100.000 which is too big an increase for the accuracy improvement (which I already tested).

My two options when dimensions mismatch are either:
- No shortcut connections (which already works well)
- Zero padded shortcuts with no added parameters
I am pretty sure that the zero padded shortcuts are better than nothing so that is why I was asking.

Quoting the ResNet paper:

(A) zero-padding shortcuts are used
for increasing dimensions, and all shortcuts are parameterfree
(the same as Table 2 and Fig. 4 right); (B) projection
shortcuts are used for increasing dimensions, and other
shortcuts are identity; and (C) all shortcuts are projections.
Table 3 shows that all three options are considerably better
than the plain counterpart. B is slightly better than A. We
argue that this is because the zero-padded dimensions in A
indeed have no residual learning.

They actually proved in table 3 that these Zero padded shortcuts are better than nothing.

model top-1 err. top-5 err.
plain-34 28.54 10.02
ResNet-34 A 25.03 7.76
ResNet-34 B 24.52 7.46
ResNet-34 C 24.19 7.40

louismartin on 4 May 2016

You can use a Lambda layer to wrap your zero_pad function. It will do
what you want.

On 4 May 2016 at 09:49, LouisMartin [email protected] wrote:

Hi fchollet,

I am actually trying to implement the smallest possible network based on
the SqueezeNet paper.
The problem is that these linear transformations add a lot of new
parameters to my network which I cannot afford. The number of parameters
goes from 800.000 to 1.100.000 which is too big an increase for the
accuracy improvement (which I already tested).

My two options when dimensions mismatch are either:

No shortcut connections (which already works well)

Zero padded shortcuts with no added parameters
I am pretty sure that the zero padded shortcuts are better than nothing so
that is why I was asking.

Quoting the ResNet paper:

(A) zero-padding shortcuts are used
for increasing dimensions, and all shortcuts are parameterfree
(the same as Table 2 and Fig. 4 right); (B) projection
shortcuts are used for increasing dimensions, and other
shortcuts are identity; and (C) all shortcuts are projections.
Table 3 shows that all three options are considerably better
than the plain counterpart. B is slightly better than A. We
argue that this is because the zero-padded dimensions in A
indeed have no residual learning.

They actually proved in table 3 that these Zero padded shortcuts are
better than nothing.

model top-1 err. top-5 err.
plain-34 28.54 10.02
ResNet-34 A 25.03 7.76
ResNet-34 B 24.52 7.46
ResNet-34 C 24.19 7.40

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/2608#issuecomment-216928580

fchollet on 4 May 2016

I'm trying to do the same thing. This code runs in the recent nightly build of TensorFlow but not in Theano.

from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Lambda
from keras import backend as K


def zeropad(x):
    y = K.zeros_like(x)
    return K.concatenate([x, y], axis=1)


def zeropad_output_shape(input_shape):
    shape = list(input_shape)
    assert len(shape) == 4
    shape[1] *= 2
    return tuple(shape)


def shortcut(input_layer, nb_filters, output_shape, zeros_upsample=True):
    # TODO: Figure out why zeros_upsample doesn't work in Theano
    if zeros_upsample:
        x = MaxPooling2D(pool_size=(1,1),
                             strides=(2,2),
                             border_mode='same')(input_layer)
        x = Lambda(zeropad, output_shape=zeropad_output_shape)(x)
    else:
        # Options B, C in ResNet paper...

roryhr on 5 May 2016

@roryhr i have read your code below, and i am wondering is this right?
firstly, function zeropad constructs a zero tensor with same shape to input x, and then concatenate both of them to get a new tensor with the channel number doubled( may be greater than need).
i am wondering what will happen in Lambda. will it cut off the redundant zeros?

def zeropad(x):
y = K.zeros_like(x)
return K.concatenate([x, y], axis=1)

def zeropad_output_shape(input_shape):
shape = list(input_shape)
assert len(shape) == 4
shape[1] *= 2
return tuple(shape)

yuimo on 24 Jul 2017

@yuimo Yes it doubles the size of dimension 1 (channels, I believe). Lambda just uses the function I defined -- so no cutting off of any zeros. I haven't tried running it lately though...I think I used TF and moved on.

roryhr on 31 Jul 2017

If anyone wants to have zero-padding on depth and knows how many channels they want to get to, you can try:

def pad_depth(x, desired_channels):
    y = K.zeros_like(x, name='pad_depth1')
    new_channels = desired_channels - x.shape.as_list()[-1]
    y = y[:,:,:new_channels]
    return concatenate([x,y], name='pad_depth2')

If you have a layer that you're trying to add with, find out the number of channels in that layer by:

 desired_channels = some_layer.shape.as_list()[-1]

Then you can call the lambda function with:

earlier_layer_with_padding= Lambda(pad_depth, name='some_name', arguments={'desired_channels':desired_channels})(some_earlier_layer)

And finally, add earlier_layer_with_padding to some_layer via:

new_layer = add([earlier_layer_with_padding, some_layer], name='some_add')

make sure you from keras import backend as K. only tested with tensorflow backend

ghost on 8 Nov 2017

👍5 ❤1 🎉1

You can also trick Keras's ZeroPadding2D layer into padding your channels by passing the "wrong" argument in for data_format. It will think it is padding height or width when it is actually padding channels.

tristan-mcrae-rochester on 14 Feb 2019

Another possibility is to use a 1x1 Conv2D with fixed weights as follows

    identity_weights = np.eye(n_channels_in, n_channels_out, dtype=np.float32)
    layer = Conv2D(
        n_channels_out, kernel_size=1, strides=strides, use_bias=False, 
        kernel_initializer=initializers.Constant(value=identity_weights))
    # Not learned!
    layer.trainable = False
    x = layer(x)

isarandi on 4 Jun 2019

If you are still looking for it in my GitHub repository I implemented it. Please take a look to https://github.com/nellopai/deepLearningModels and in case you have question feel free to write me

nellopai on 12 May 2020

If anyone wants to have zero-padding on depth and knows how many channels they want to get to, you can try:
def pad_depth(x, desired_channels):
    y = K.zeros_like(x, name='pad_depth1')
    new_channels = desired_channels - x.shape.as_list()[-1]
    y = y[:,:,:new_channels]
    return concatenate([x,y], name='pad_depth2')
If you have a layer that you're trying to add with, find out the number of channels in that layer by:
 desired_channels = some_layer.shape.as_list()[-1]
Then you can call the lambda function with:
earlier_layer_with_padding= Lambda(pad_depth, name='some_name', arguments={'desired_channels':desired_channels})(some_earlier_layer)
And finally, add earlier_layer_with_padding to some_layer via:

new_layer = add([earlier_layer_with_padding, some_layer], name='some_add')

make sure you from keras import backend as K. only tested with tensorflow backend

Some minor change if you want to work with batch (which i think most people are working with):

def pad_depth(x, desired_channels):
    y = K.zeros_like(x)
    new_channels = desired_channels - x.shape.as_list()[-1]
    y = y[..., :new_channels]
    return concatenate([x, y], axis=-1)