Addons: Unpooling layer in tensorflow

Created on 29 Apr 2016  路  127Comments  路  Source: tensorflow/addons

It would be nice to have in TensorFlow also the unpooling layer as it is described in the paper on deconvolution networks: http://cvlab.postech.ac.kr/research/deconvnet/

I was googling a bit and I found that the added unpooling layer would be handful also for others:
http://stackoverflow.com/questions/36548736/tensorflow-unpooling

Feature Request help wanted layers

Most helpful comment

Hi, I implemented the batch version (i.e. batch_size >= 1) of @fabianbormann 's unpool layer and it's been working well for me:


  def unravel_argmax(argmax, shape):
    output_list = [argmax // (shape[2]*shape[3]),
                   argmax % (shape[2]*shape[3]) // shape[3]]
    return tf.pack(output_list)

  def unpool_layer2x2_batch(bottom, argmax):
    bottom_shape = tf.shape(bottom)
    top_shape = [bottom_shape[0], bottom_shape[1]*2, bottom_shape[2]*2, bottom_shape[3]]

    batch_size = top_shape[0]
    height = top_shape[1]
    width = top_shape[2]
    channels = top_shape[3]

    argmax_shape = tf.to_int64([batch_size, height, width, channels])
    argmax = unravel_argmax(argmax, argmax_shape)

    t1 = tf.to_int64(tf.range(channels))
    t1 = tf.tile(t1, [batch_size*(width//2)*(height//2)])
    t1 = tf.reshape(t1, [-1, channels])
    t1 = tf.transpose(t1, perm=[1, 0])
    t1 = tf.reshape(t1, [channels, batch_size, height//2, width//2, 1])
    t1 = tf.transpose(t1, perm=[1, 0, 2, 3, 4])

    t2 = tf.to_int64(tf.range(batch_size))
    t2 = tf.tile(t2, [channels*(width//2)*(height//2)])
    t2 = tf.reshape(t2, [-1, batch_size])
    t2 = tf.transpose(t2, perm=[1, 0])
    t2 = tf.reshape(t2, [batch_size, channels, height//2, width//2, 1])

    t3 = tf.transpose(argmax, perm=[1, 4, 2, 3, 0])

    t = tf.concat(4, [t2, t3, t1])
    indices = tf.reshape(t, [(height//2)*(width//2)*channels*batch_size, 4])

    x1 = tf.transpose(bottom, perm=[0, 3, 1, 2])
    values = tf.reshape(x1, [-1])

    delta = tf.SparseTensor(indices, values, tf.to_int64(top_shape))
    return tf.sparse_tensor_to_dense(tf.sparse_reorder(delta))

All 127 comments

For deconv, you can use "conv2d_backprop_input" with stride to achieve similar effect. It is the gradient of the conv with stride.

my implementation using tf.reshape and tf.concat

def unpool(value, name='unpool'):
    """N-dimensional version of the unpooling operation from
    https://www.robots.ox.ac.uk/~vgg/rg/papers/Dosovitskiy_Learning_to_Generate_2015_CVPR_paper.pdf

    :param value: A Tensor of shape [b, d0, d1, ..., dn, ch]
    :return: A Tensor of shape [b, 2*d0, 2*d1, ..., 2*dn, ch]
    """
    with tf.name_scope(name) as scope:
        sh = value.get_shape().as_list()
        dim = len(sh[1:-1])
        out = (tf.reshape(value, [-1] + sh[-dim:]))
        for i in range(dim, 0, -1):
            out = tf.concat([out, tf.zeros_like(out)], i)
        out_size = [-1] + [s * 2 for s in sh[1:-1]] + [sh[-1]]
        out = tf.reshape(out, out_size, name=scope)
    return out


def pool(value, name='pool'):
    """Downsampling operation.
    :param value: A Tensor of shape [b, d0, d1, ..., dn, ch]
    :return: A Tensor of shape [b, d0/2, d1/2, ..., dn/2, ch]
    """
    with tf.name_scope(name) as scope:
        sh = value.get_shape().as_list()
        out = value
        for sh_i in sh[1:-1]:
            assert sh_i % 2 == 0
        for i in range(len(sh[1:-1])):
            out = tf.reshape(out, (-1, 2, np.prod(sh[i + 2:])))
            out = out[:, 0, :]
        out_size = [-1] + [math.ceil(s / 2) for s in sh[1:-1]] + [sh[-1]]
        out = tf.reshape(out, out_size, name=scope)
    return out

I've been interested in this as well; currently working on 'what-where' / convolutional autoencoders (ala. Zhao et al.)

Thanks @daeyun for the code, I've been trying to figure this out myself. Dosovitskiy uses a kronecker product w/ a block mask (same shape as pooling, all zeros w/ a 1 in the upper left) to unpool. However, as observed in the paper (fig 9) this fails to reconstruct meaningful structure in deeper feature maps. An alternative proposed by Zeiler uses 'switches' (essentially the argmax of the maxpooling operation) to reconstruct using the exact location of the maxima

I've been playing around with tf.maxpool_with_argmax in an attempt to reproduce the 'switched' unpooling experiments first explored by Zeiler and extended by Zhao.

Any thoughts on how this could be implemented?

What's the mathematical definition of unpooling?

The unpooling that I had on my ming is described in here http://www.matthewzeiler.com/pubs/iccv2011/iccv2011.pdf
and corresponding implementation in caffe can be found here: https://github.com/HyeonwooNoh/caffe/blob/master/src/caffe/layers/unpooling_layer.cpp
Also some more formal description is available in the torch documentation:
https://github.com/torch/nn/blob/master/doc/convolution.md#spatialmaxunpooling

@ziky90 That's the gradient of max pooling, which we already have an as op.

@girving Thank you for pointing me at gradient of max pooling. Though it's really difficult to find it as a gradient of max pooling, plus it's also not much documented.
Is there a plan to create separate "layer", for example tf.nn.max_unpool, etc.? From my point of view it'd be much more intuitive, together with adding the documentation it would make it super easy to use.

Btw. It seems, that it confuses and makes other people to build custom solutions instead of simply using something like tf.nn.max_unpool. @ppwwyyxx
https://github.com/ppwwyyxx/tensorpack/blob/master/tensorpack/models/pool.py#L66

Yes, giving it a name like tf.nn.max_unpool with good documentation might be good, and we'd be happy to accept PRs.

As a tip for the future, though: this is one advantage of trying to understand the mathematical relationship between different operations. Once you know that unpooling is just the gradient of pooling, it's clear that TensorFlow already implements it, even if the name is different from what one might expect.

Could you share a code example of how to implement unpooling using the gradient of max pooling?

It's currently hidden as gen_nn_ops._max_pool_grad, and is used only from the gradient of max_pool:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn_grad.py#L353

There's also gen_nn_ops._max_pool_with_argmax_grad. Unfortunately, both of them take the original input, which means they'd have to be tweaked to serve as unpooling.

Any plans to get a unpool layer to tensorflow? @girving as you point out, if the gradient operation already exists, then it doesn't seem like much work to get it working?

@LeavesBreathe I was wrong initially about how easy it would be, since the gradient operators as written taken the original input. Thus, we probably do need a new exposed op, though it may be able to use the same underlying compute kernels (I'm not sure).

Are there any performance gain/loss if one uses the second output of tf.nn.max_pool_with_argmax (which are the indices of the max pool) and uses it along with a tf.map_fn to achive a max unpooling?

@syed-ahmed That doesn't work: if you are doing unpooling, you don't start out with an input that you could pass to tf.nn.max_pool_with_argmax.

@girving Can we not just save the indices from tf.nn.max_pool_with_argmax during downsampling for reuse during upsampling? We would use the saved argmax indices to inform us where we want the input to the corresponding upsample layer to go.

@syed-ahmed To clarify, it will work but it's a bit awkward. You can certainly store the indices, but the current MaxPoolGradWithArgmax op also wants the _values_ that you originally passed to max pooling. It should use only the shape from these values, but you still need to pass them in. That's not too horrible when it's used as a gradient (though it's still a memory usage bug), but it is not clean enough to give it a nice name.

The same bug occurred in the initial version of conv_3d, so if someone wants to fix this they can look at https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/conv_grad_ops_3d.cc. The code defines a new op that takes an original shape input rather the whole original input, and uses the same C++ kernel to implement both of them (with a conditional based on name).

If anyone does this, the new op can be given a nicer name like max_unpool.

@girving Thanks for clarifying! I totally forgot the case about the gradient. I'll try to fix this issue.

Hi @girving, could you please tell what error would result with the memory usage bug? Just wanted to clarify, is it a bug because it's not best practice or did you encounter an error during that initial version of conv_3d? I get the following error for the implementation described above with MaxPoolWithArgmax and was wondering if anybody encountered it before:

E tensorflow/stream_executor/cuda/cuda_driver.cc:1110] failed to synchronize the stop event: CUDA_ERROR_ILLEGAL_ADDRESS E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x69951c0: CUDA_ERROR_ILLEGAL_ADDRESS E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x69951c0: CUDA_ERROR_ILLEGAL_ADDRESS F tensorflow/stream_executor/cuda/cuda_timer.cc:64] Check failed: start_event_ != nullptr && stop_event_ != nullptr ```

@syed-ahmed It's not an actual error unless you run out of memory. The issue is that if the gradient takes the original input tensor rather than the shape, the original input must be stored for the remainder of the forward pass and the backward pass up to that point. If only the shape is needed, that's a long time to hold onto otherwise unneeded memory.

@girving Thanks for your reply. I am defining a MaxUnpoolGrad for the corresponding MaxUnpool operation that I have implemented. Following is what I declare as top_offset and bottom_offset for MaxUnpoolGrad:

const int top_offset = params.tensor_in_rows * params.tensor_in_cols * params.depth; 
const int bottom_offset = params.out_height * params.out_width * params.depth;

The correspoding cuda kernel declared in maxpooling_op_gpu.cu.cc is:

template <typename dtype>
__global__ void MaxUnpoolBackward(const int nthreads, const dtype* top_diff,
                                          const int64* mask, const int top_offset,
                                  const int bottom_offset, dtype* bottom_diff) {
  CUDA_1D_KERNEL_LOOP(index, nthreads) {
    int image_id = (index / bottom_offset);
    CudaAtomicAdd(bottom_diff + index, top_diff[mask[index] + image_id * top_offset]);
  }
}

My graph builds but it is when the session runs that I get the following error:

E tensorflow/stream_executor/cuda/cuda_driver.cc:1110] failed to synchronize the stop event: CUDA_ERROR_ILLEGAL_ADDRESS
E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x69951c0: CUDA_ERROR_ILLEGAL_ADDRESS
E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x69951c0: CUDA_ERROR_ILLEGAL_ADDRESS
F tensorflow/stream_executor/cuda/cuda_timer.cc:64] Check failed: start_event_ != nullptr && stop_event_ != nullptr 

I am also returning in nn_grad.py like this:

[None, gen_nn_ops._max_unpool_grad(array_ops.shape(op.inputs[1]),
                                     grad,
                                     op.inputs[2],
                                     op.get_attr("ksize"),
                                     op.get_attr("strides"),
                                     padding=op.get_attr("padding")), None)]

where:

MaxUnpool
-input0: input_shape
-input1: grad_in
-input3: argmax

I have made sure the maxunpooling and its grad operation is taking a input shape rather than a input 4D tensor. Do you know how to debug this cuda errors/any tool that can help in finding the origin of these errors? What does these errors indicate? I read a comment on the maxpooling_op_gpu.cu.cc about racing conditions. Is it anyhow related to this?

@syed-ahmed Is it possible to use cuDNN for these operations? Writing them yourself will result in very slow code. The same goes for CPU: it would be better to use existing Eigen code if possible.

@girving Thank you for your reply. I will try implementing the cudnn version once i get this cuda one running. I was able to use cuda-gdb to get some sort of trace where my error is originating from. Here's the output from cuda-gdb:

CUDA Exception: Warp Out-of-range Address
The exception was triggered at PC 0x7ffe9976c1d0

Program received signal CUDA_EXCEPTION_5, Warp Out-of-range Address.
[Switching focus to CUDA kernel 0, grid 4660, block (172,0,0), thread (256,0,0), device 0, sm 0, warp 40, lane 0]
0x00007ffe9976c218 in void tensorflow::(anonymous namespace)::MaxUnpoolForward<float>(int, float const*, long long const*, int, int, float*)<<<(662,1,1),(1024,1,1)>>> ()

Here's how it is defined in the cu.cc file:

...
template <typename dtype>
__global__ void MaxUnpoolForward(const int nthreads, const dtype* top_diff,
                                const int64* mask, const int top_offset,
                                const int bottom_offset, dtype* bottom_diff) {
  CUDA_1D_KERNEL_LOOP(index, nthreads) {
    int image_id = (index / top_offset);
    CudaAtomicAdd(bottom_diff + image_id * bottom_offset + mask[index],
                  top_diff[index]);
  }
}

template <typename dtype>
__global__ void MaxUnpoolBackward(const int nthreads, const dtype* top_diff,
                                          const int64* mask, const int top_offset,
                                  const int bottom_offset, dtype* bottom_diff) {
  CUDA_1D_KERNEL_LOOP(index, nthreads) {
    int image_id = (index / bottom_offset);
    CudaAtomicAdd(bottom_diff, top_diff[mask[index] + image_id * top_offset]);
  }
}

#undef CUDA_1D_KERNEL_LOOP
...

I am kinda lost since I'm a beginner with cuda. Anybody has any idea what might be going wrong?

It's impossible to debug this without seeing your code. As a wild guess: maybe you are running GPU kernels on Tensor objects stored on the CPU?

Hi @girving. Sorry for not posting the full code. I didn't want to lengthen this issue by posting all the code. You can review the changes in this link.

I am calling the max unpool like this:

 return gen_nn_ops._max_unpool(array_ops.shape(origin_input_tensor), grad,
                                     argmax_tensor,
                                     ksize=[1, 2, 2, 1], strides=[1,1,1,1],
                                     padding="VALID", name=name)

I am not sure if the origin_input_tensor and argmax_tensor objects are in CPU or GPU. The cuda-gdb output of MaxUnpoolForward suggests that "This occurs when any thread within a warp accesses an address that is outside the valid range of local or shared memory regions." gpu error reporting

Also there is a lot of code duplication in my changes. I can make the unpool op use the same compute kernel. I was just trying out if using the same compute kernel was causing the CUDA error in the version I posted here.

In the Tensorflow implementation (https://github.com/MarvinTeichmann/tensorflow-fcn/blob/master/fcn32_vgg.py) of fully convolutional model (https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf), author define a function of

``def _upscore_layer(self, bottom, shape,
                   num_classes, name, debug,
                   ksize=4, stride=2):
       strides = [1, stride, stride, 1]
        with tf.variable_scope(name):
        in_features = bottom.get_shape()[3].value

        if shape is None:
            # Compute shape out of Bottom
            in_shape = tf.shape(bottom)

            h = ((in_shape[1] - 1) * stride) + 1
            w = ((in_shape[2] - 1) * stride) + 1
            new_shape = [in_shape[0], h, w, num_classes]
        else:
            new_shape = [shape[0], shape[1], shape[2], num_classes]
        output_shape = tf.pack(new_shape)

        logging.debug("Layer: %s, Fan-in: %d" % (name, in_features))
        f_shape = [ksize, ksize, num_classes, in_features]

        # create
        num_input = ksize * ksize * in_features / stride
        stddev = (2 / num_input)**0.5

        weights = self.get_deconv_filter(f_shape)
        deconv = tf.nn.conv2d_transpose(bottom, weights, output_shape,
                                        strides=strides, padding='SAME')

        if debug:
            deconv = tf.Print(deconv, [tf.shape(deconv)],
                              message='Shape of %s' % name,
                              summarize=4, first_n=1)

    _activation_summary(deconv)
    return deconv

Looks like author just uses tf.nn.conv2d_transpose to do the upsampling. Is my understanding correct?

@wenouyang Yes in the FCN in https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf they use only tf.nn.conv2d_transpose() to perform the upsampling, but there exists also other models, mainly for semantic segmentation that use also max_unpooling, for example http://arxiv.org/abs/1505.04366.

Sorry for the delay, taking a look at your code now.

I must not understand your code. How are you doing an effectively 3D unpooling operation (batch, height, width) with a 1D loop that does only one integer division? One integer division is only powerful enough to express a 2D loop.

@girving I followed the MaxPoolBackward code in the maxpooling_op_gpu.cu.cc. I thought n-dimensions of the tensor is taken care of by the following in maxpooling_op.cc in the LaunchMaxUnpooling function I defined (like LaunchMaxPoolingGradWithArgmax):

const int input_size = params.tensor_in_batch * params.tensor_in_rows *
                           params.tensor_in_cols * params.depth;
const int output_size = params.tensor_in_batch * params.out_height *
                            params.out_width * params.depth;
const int top_offset = params.out_height * params.out_width * params.depth;
const int bottom_offset = params.tensor_in_rows * params.tensor_in_cols * params.depth;

@syed-ahmed Ah, got it: the indices are already flattened, so it only needs to be 2D. Unfortunately I don't know why your code is failing; I would try to replicate the behavior with the existing routine and then add print statements until you know what differs.

@girving Thank you for your reply.

@ziky90, thank you for your response. Kind of related to my current question, I have some confusions on the kernel size specification related to upsampling layer implemented using tf.nn.conv2d_transpose( ). http://stats.stackexchange.com/questions/226047/kernel-size-and-stride-value-for-fully-convolutional-network-for-semantic-segmen

I noticed that you have get involved in the discussion related to fcn on stackoverflow. If you do not mind, would you like to share some thoughts on my question. Thank you very much.

+1

I also try to implement the DeconvNet described in Learning Deconvolution Network for Semantic Segmentation and I'm very interested in a native method like tf.max_unpool_with_argmax too, but for now I want to share my python tf implementation (example):

def unravel_argmax(argmax, shape):
    output_list = []
    output_list.append(argmax // (shape[2] * shape[3]))
    output_list.append(argmax % (shape[2] * shape[3]) // shape[3])
    return tf.pack(output_list)

def unpool_layer2x2(x, argmax):
    x_shape = tf.shape(x)
    output = tf.zeros([x_shape[1] * 2, x_shape[2] * 2, x_shape[3]])

    height = tf.shape(output)[0]
    width = tf.shape(output)[1]
    channels = tf.shape(output)[2]
    # build the indices for a SparseTensor addition like http://stackoverflow.com/a/34686952/3524844
    t1 = tf.to_int64(tf.range(channels))
    t1 = tf.tile(t1, [(width // 2) * (height // 2)])
    t1 = tf.reshape(t1, [-1, channels])
    t1 = tf.transpose(t1, perm=[1, 0])
    t1 = tf.reshape(t1, [channels, height // 2, width // 2, 1])

    t2 = tf.squeeze(argmax)
    t2 = tf.pack((t2[0], t2[1]), axis=0)
    t2 = tf.transpose(t2, perm=[3, 1, 2, 0])

    t = tf.concat(3, [t2, t1])
    indices = tf.reshape(t, [(height // 2) * (width // 2) * channels, 3])
    # Get the values for max_unpooling (used in addition with argmax location)
    x1 = tf.squeeze(x)
    x1 = tf.reshape(x1, [-1, channels])
    x1 = tf.transpose(x1, perm=[1, 0])
    values = tf.reshape(x1, [-1])
    # perform addition
    delta = tf.SparseTensor(indices, values, tf.to_int64(tf.shape(output)))
    return tf.expand_dims(tf.sparse_tensor_to_dense(tf.sparse_reorder(delta)), 0)

of an unpooling using the unraveled argmax of tf.nn.max_pool_with_argmax for everybody searching for a similar method -> replace all loops with tensor transformations was a little bit tricky and maybe there is a better (more readable) way - first I tried to use nested tf.while_loop but this was very slow. My implementation assumes a batch_size == 1 but for other use cases it could be simply rewrite.

@fabianbormann Great solution. I am implementing the deconv net, and am also stuck on this. Since I am new to Tensorflow, could you give me some hint on converting your code to something that works with any batch size?
Thanks,

@hermitman you would need to expand the indices so that you can access a new 4D (output = tf.zeros([x_shape[0], x_shape[1] * 2, x_shape[2] * 2, x_shape[3]])). indices is currently a tensor with coordinates [h, w, c] and values is a list with values matching to this coordinates.

You need to change the transformations, so that indices also respects [b, h, w, c] and add all corresponding batch values to the values list. I opened an issue #3 in my project and I will fix it soon. If you implement the deconv net too, it would be great if you fork my project (or I could give you write access) so that we could share some knowledge during the implementation! (The same applies for everyone else)

@fabianbormann Thanks for the detailed explanation. For me, I am using the deconv net as a part of the reconstruction network in my project. I will try to implement your solution first, and if I figure it out, I will clean up this part and share it.

@fabianbormann Hi, I am reading the code and referring to the previous discussion in this thread. I have this question, how do the current implementation backprop the grad through the unpooling layer? Is it taken care of by the tf.[ops] ?

@fabianbormann @girving Can your unpooling operation backpropagate gradients? I manage to get one version work on the forward pass, but tensorflow could not backpropagate the gradients. My code currently works like this:

def unpool_layer2x2(inputs, argmax, name):

    with tf.variable_scope(name) as scope:

        x_shape = tf.shape(inputs)
        batches = x_shape[0]
        height = x_shape[1]
        width = x_shape[2]
        channels = x_shape[3]

        height_ori = height * 2
        width_ori = width*2

        argmax_offset = tf.range(batches)
        argmax_offset = tf.reshape(argmax_offset, [-1, 1, 1, 1])
        with tf.device('/cpu:0'):
          argmax_offset = tf.tile(argmax_offset, [1, height, width, channels]) * height_ori * width_ori * channels
        argmax = argmax + tf.to_int64(argmax_offset)

        list_x = tf.reshape(inputs, [batches*height*width*channels, 1])
        list_argmax = tf.reshape(argmax, [batches*height*width*channels, 1])
        list_indices_batches = list_argmax//tf.to_int64(height_ori*width_ori*channels)
        with tf.device('/cpu:0'):
            list_indices_height = list_argmax%tf.to_int64(height_ori*width_ori*channels) // tf.to_int64(width_ori*channels)
            list_indices_width = list_argmax%tf.to_int64(width_ori*channels) // tf.to_int64(channels)
            list_indices_channels = list_argmax%tf.to_int64(channels)
            list_indices = tf.concat(1, [list_indices_batches, list_indices_height, list_indices_width, list_indices_channels])
        output = tf.SparseTensor(list_indices, tf.squeeze(list_x), tf.to_int64([batches, height_ori, width_ori, channels]))
        with tf.device('/cpu:0'):
          return tf.sparse_tensor_to_dense(tf.sparse_reorder(output))

I am not familiar with how TF determine if a op is differentiable, so I do not know what I did was affecting the backprop. Could you direct me to some related readings?

I don't know if you're feeding in a tf.argmax to your argmax argument, but I'm pretty sure tf.argmax is non-differentiable.

@LeavesBreathe Hi, I am not trying to backprop gradients to the argmax. I am using the argmax to create a unpooling path. The gradients will be directed according to such paths. For example,

if my bottom for this layer is 2_4_4_1 (batch_height_width_channels), then the desired output is 2_8_8*1, where in each 2x2 neighborhood, there is only one active pixel. The exact location of the pixel is determined by argmax, which comes from the maxpool_with_argmax() op.

When backprop, the top input gradient map is 2_8_8_1, and then the corresponding gradient should be directly send to the output location in the 2_4_4_1 bottom output. There is not computation in the process, but merely directing the gradients to the correct location. It is the reverse of maxpool i think.

Could you help me find out a way to implement the aforementioned op?

@hermitman This is my first post, so apologies if I get it wrong, but I think that this post addresses the issues with the missing gradients https://github.com/tensorflow/tensorflow/issues/1793#issuecomment-234070576

@cjspoerer Hi, thanks for you response. The gradient for max_pool_with_argmax can be retrieved in your mentioned tickets. However, since we are implementing the unpooling operation, which do not have an available tensor flow OP, we are struggling to get gradient out of this new op. The operation we want is the same as described in:

The unpooling that I had on my ming is described in here http://www.matthewzeiler.com/pubs/iccv2011/iccv2011.pdf
and corresponding implementation in caffe can be found here: https://github.com/HyeonwooNoh/caffe/blob/master/src/caffe/layers/unpooling_layer.cpp
Also some more formal description is available in the torch documentation:
https://github.com/torch/nn/blob/master/doc/convolution.md#spatialmaxunpooling

Hi, I implemented the batch version (i.e. batch_size >= 1) of @fabianbormann 's unpool layer and it's been working well for me:


  def unravel_argmax(argmax, shape):
    output_list = [argmax // (shape[2]*shape[3]),
                   argmax % (shape[2]*shape[3]) // shape[3]]
    return tf.pack(output_list)

  def unpool_layer2x2_batch(bottom, argmax):
    bottom_shape = tf.shape(bottom)
    top_shape = [bottom_shape[0], bottom_shape[1]*2, bottom_shape[2]*2, bottom_shape[3]]

    batch_size = top_shape[0]
    height = top_shape[1]
    width = top_shape[2]
    channels = top_shape[3]

    argmax_shape = tf.to_int64([batch_size, height, width, channels])
    argmax = unravel_argmax(argmax, argmax_shape)

    t1 = tf.to_int64(tf.range(channels))
    t1 = tf.tile(t1, [batch_size*(width//2)*(height//2)])
    t1 = tf.reshape(t1, [-1, channels])
    t1 = tf.transpose(t1, perm=[1, 0])
    t1 = tf.reshape(t1, [channels, batch_size, height//2, width//2, 1])
    t1 = tf.transpose(t1, perm=[1, 0, 2, 3, 4])

    t2 = tf.to_int64(tf.range(batch_size))
    t2 = tf.tile(t2, [channels*(width//2)*(height//2)])
    t2 = tf.reshape(t2, [-1, batch_size])
    t2 = tf.transpose(t2, perm=[1, 0])
    t2 = tf.reshape(t2, [batch_size, channels, height//2, width//2, 1])

    t3 = tf.transpose(argmax, perm=[1, 4, 2, 3, 0])

    t = tf.concat(4, [t2, t3, t1])
    indices = tf.reshape(t, [(height//2)*(width//2)*channels*batch_size, 4])

    x1 = tf.transpose(bottom, perm=[0, 3, 1, 2])
    values = tf.reshape(x1, [-1])

    delta = tf.SparseTensor(indices, values, tf.to_int64(top_shape))
    return tf.sparse_tensor_to_dense(tf.sparse_reorder(delta))

@EmmaBYPeng Hi, thanks for sharing the awesome solution. Did you try to backprop through this layer, such as train the deconvolution net with this unpooling layer? It seems that this implementation still cannot receive gradients during training.

The forward works fine, in the sense that given the downsampled image and argmax, it can generate the upsampled image with black pixel filler. However, during training, if we place the unpooling layer between conv2d_transpose ops, I found that the gradients cannot propagate through the unpooling layer.

@hermitman Hi, I trained my deconv net with this layer and I believe the gradients are back propagated correctly (@fabianbormann 's original version also worked for me).

@EmmaBYPeng That is awesome! Could you share your deconv network structure? I have stuck on this problem for a while. If it works for you, then the problem should be somewhere in my deconv network structure. I basically have something like this:

conv2d_transpose(...)
unpool2x2
conv2d_transpose(...)
unpool2x2
...

if I opt.compute_gradient(), the gradients after the first unpooling layer are all None. If the unpooling is fine, I wonder what caused my problems. = =!

May I take a look at your deconv structure, so I can figure out the difference in implementations?

@EmmaBYPeng It is really great that you have shared the code with us. I just realized that its performance (runtime) is slow. Do you have any improvements in the performance yet?
Thanks.

With regards to performance, as far as I can tell, the main bottleneck is a result of the sparse tensor re-order and sparse-to-dense operations, which are both performed on the CPU it seems. Here is the resulting CUPTI GPU trace for a single training step using the max_unpool method. The gray and purple are the reorder and sparse-to-dense operations respectively.
chrome___tracing.pdf

Why could tf.gather not be used for this problem? Does it not propagate gradients?

Hi @hermitman : Were you able to solve the gradients not backpropagating problem? I used @EmmaBYPeng's code and faced the same issue as yours. The weights don't update during backpropagation after the unpooling layer.

@hermitman @sirajulsalekin
For whatever reason I could also not backprop through @fabianbormann or @EmmaBYPeng unpool code (although they was able to). If anyone comes across the same problem my solution was to replace the sparse_tensor_to_dense op as mentioned in https://github.com/tensorflow/tensorflow/issues/6391 as sparse_tensor_to_dense doesnt have a gradient (0.11)

Naturally this doesnt change the speed problems but as far as I can tell only a dedicated op as mentioned by @girving and @syed-ahmed will solve that

Namely

# Original
return tf.sparse_tensor_to_dense(tf.sparse_reorder(delta))

# New
# https://github.com/tensorflow/tensorflow/issues/6391
return tf.sparse_add(tf.zeros(tf.to_int32(delta.shape)), tf.sparse_reorder(delta))

much faster

def unpool_layer2x2_batch(bottom, argmax):
    bottom_shape = tf.shape(bottom)
    top_shape = [bottom_shape[0], bottom_shape[1] * 2, bottom_shape[2] * 2, bottom_shape[3]]

    batch_size = top_shape[0]
    height = top_shape[1]
    width = top_shape[2]
    channels = top_shape[3]

    argmax_shape = tf.to_int64([batch_size, height, width, channels])
    argmax = unravel_argmax(argmax, argmax_shape)

    t1 = tf.to_int64(tf.range(channels))
    t1 = tf.tile(t1, [batch_size * (width // 2) * (height // 2)])
    t1 = tf.reshape(t1, [-1, channels])
    t1 = tf.transpose(t1, perm=[1, 0])
    t1 = tf.reshape(t1, [channels, batch_size, height // 2, width // 2, 1])
    t1 = tf.transpose(t1, perm=[1, 0, 2, 3, 4])

    t2 = tf.to_int64(tf.range(batch_size))
    t2 = tf.tile(t2, [channels * (width // 2) * (height // 2)])
    t2 = tf.reshape(t2, [-1, batch_size])
    t2 = tf.transpose(t2, perm=[1, 0])
    t2 = tf.reshape(t2, [batch_size, channels, height // 2, width // 2, 1])

    t3 = tf.transpose(argmax, perm=[1, 4, 2, 3, 0])

    t = tf.concat(4, [t2, t3, t1])
    indices = tf.reshape(t, [(height // 2) * (width // 2) * channels * batch_size, 4])

    x1 = tf.transpose(bottom, perm=[0, 3, 1, 2])
    values = tf.reshape(x1, [-1])
    return tf.scatter_nd(indices, values, tf.to_int64(top_shape))

Thanks @Pepslee, I was going to try tf.scatter_nd as well but unfortunately im stuck to 0.11 at the moment which doesnt have it.

I tried this function at the master branch of tensorflow

Thanks a lot @danbarnes333. I tried your code and it worked ! I will run @Pepslee's code too to see the difference.

even faster

def unravel_argmax(argmax, shape):
    argmax_shape = argmax.get_shape()
    new_1dim_shape = tf.shape(tf.constant(0, shape=[tf.Dimension(4), argmax_shape[0]*argmax_shape[1]*argmax_shape[2]*argmax_shape[3]]))
    batch_shape = tf.constant(0, dtype=tf.int64, shape=[argmax_shape[0], 1, 1, 1]).get_shape()
    b = tf.multiply(tf.ones_like(argmax), tf.reshape(tf.range(shape[0]), batch_shape))
    y = argmax // (shape[2] * shape[3])
    x = argmax % (shape[2] * shape[3]) // shape[3]
    c = tf.ones_like(argmax) * tf.range(shape[3])
    pack = tf.stack([b, y, x, c])
    pack = tf.reshape(pack, new_1dim_shape)
    pack = tf.transpose(pack)
    return pack


def unpool(updates, mask, ksize=[1, 2, 2, 1]):
    input_shape = updates.get_shape()
    new_dim_y = input_shape[1] * ksize[1]
    new_dim_x = input_shape[2] * ksize[2]
    output_shape = tf.to_int64((tf.constant(0, dtype=tf.int64, shape=[input_shape[0], new_dim_y, new_dim_x, input_shape[3]]).get_shape()))
    indices = unravel_argmax(mask, output_shape)
    new_1dim_shape = tf.shape(tf.constant(0, shape=[input_shape[0] * input_shape[1] * input_shape[2] * input_shape[3]]))
    values = tf.reshape(updates, new_1dim_shape)
    ret = tf.scatter_nd(indices, values, output_shape)
    return ret

Hi,
@Pepslee can you mention what is 'mask' in your unpool function? Also, can you give a simple example how I can use that function in my own program? I am trying to use it in deconvolution network?
Thanks,
Ali

Hi @amortazi , 'mask' - is the result of the tf.nn.max_pool_with_argmax(input=image, ksize=ksize, strides=[1, 2, 2, 1], padding='SAME') operation.
mask is the tensor of indices of max values of input_tensor.
input_tensor is the input tensor of the maxpool operation.
This operation ( unpool ) is inverted to the maxpool.

Remade code in one function

def unpool(updates, mask, ksize=[1, 2, 2, 1]):
    input_shape = updates.get_shape().as_list()
    #  calculation new shape
    output_shape = (input_shape[0], input_shape[1] * ksize[1], input_shape[2] * ksize[2], input_shape[3])
    # calculation indices for batch, height, width and feature maps
    one_like_mask = tf.ones_like(mask)
    batch_range = tf.reshape(tf.range(output_shape[0], dtype=tf.int64), shape=[input_shape[0], 1, 1, 1])
    b = one_like_mask * batch_range
    y = mask // (output_shape[2] * output_shape[3])
    x = mask % (output_shape[2] * output_shape[3]) // output_shape[3]
    feature_range = tf.range(output_shape[3], dtype=tf.int64)
    f = one_like_mask * feature_range
    # transpose indices & reshape update values to one dimension
    updates_size = tf.size(updates)
    indices = tf.transpose(tf.reshape(tf.stack([b, y, x, f]), [4, updates_size]))
    values = tf.reshape(updates, [updates_size])
    ret = tf.scatter_nd(indices, values, output_shape)
    return ret

@zheng-xq Opinions on whether we should accept an unpooling layer that just calls scatter_nd? A native op would be faster but probably not tremendously faster, since fundamentally it is just doing a scatter.

Hi,
Thanks for the response @Pepslee. I run it and it is working well, but only problem is that it is slow. I checked it and it seems some operations (like // and %) can not be run on GPU. So, in the middle of my network they are passed to CPU and it is a reason for being slow.
I am wondering if anyone has any suggestion for solving this problem.
Also, I have asked this question in Stackoverflow : (http://stackoverflow.com/questions/41797875/running-tf-mod-and-tf-floor-div-in-tensorflow-in-gpu)
Thanks

@girving Which one native op would be faster?

@Pepslee A hypothetical native op that does the whole thing at once. Do you have some intuition for how that would compare speed-wise?

@amortazi tf.scatter_nd is CPU only also. I think the code from above can be rewritten without // and % but scatter is probably the main bottleneck.

@ivankreso yes, tf.stack and tf .scatter_nd are both bottleneck. Currently, both of them are just in CPU. I am trying to register them in GPU, so we can remove those bottlenecks.
Here are the issues about stack and scatter:
https://github.com/tensorflow/tensorflow/issues/7026
https://github.com/tensorflow/tensorflow/issues/7027

I highly appreciate if anyone can help! thanks

I can remake this code without tf.stack and //, % , but I can`t find the GPU analog of tf.scatter_nd

@Pepslee Many thanks for your code. If you are using None as the batch_size in the placeholder, batch_range = tf.reshape(tf.range(output_shape[0], dtype=tf.int64), shape=[input_shape[0], 1, 1, 1]) fails with an error in reshape. Is there anyway around this?

@Pepslee @mshunshin I'm also encountering the same error in reshape when the batch_size is None. Did you manage to solve it?

I am not sure if it is the right place.

According to docs max_pool_with_argmax computes indexes using next formula:
[b, y, x, c] -> ((b * height + y) * width + x) * channels + c

while infact batch index is ignored and next formula used:
[b, y, x, c] -> (y * width + x) * channels + c

So, smth is wrong there.

Code by @Pepslee seems to take it into account.

I took a try at adapting the unpooling function to support a partially defined input shape (without the batch size), and here it is:

def unpool(updates, mask, ksize=2, name="unpool"):
    if isinstance(ksize, int):
        ksize = [1, ksize, ksize, 1]
    input_shape = updates.get_shape().as_list()
    #  calculation new shape
    output_shape = [input_shape[1] * ksize[1], input_shape[2] * ksize[2], input_shape[3]]
    # calculation indices for batch, height, width and feature maps
    one_like_mask = tf.ones_like(mask)
    bsize = tf.to_int64(tf.shape(updates)[0])
    batch_range = tf.reshape(tf.range(bsize, dtype=tf.int64),
                             shape=[-1, 1, 1, 1])
    b = one_like_mask * batch_range
    y = mask // (output_shape[1] * output_shape[2])
    x = mask % (output_shape[1] * output_shape[2]) // output_shape[2]
    feature_range = tf.range(output_shape[2], dtype=tf.int64)
    f = one_like_mask * feature_range
    # transpose indices & reshape update values to one dimension
    updates_size = tf.size(updates)
    indices = tf.transpose(tf.reshape(tf.stack([b, y, x, f]), [4, updates_size]))
    values = tf.reshape(updates, [updates_size])
    ret = tf.scatter_nd(indices, values, tf.concat(
        [[bsize], tf.to_int64(output_shape)], axis=0))
    return ret

In the batch range, I simply took advantage of the -1 dimension. Then I fetched the batch size and built the final output shape dynamically (as a Tensor). Admittedly, I am not sure if I employed the fastest and most elegant operations, but it appears to be working on this side.

I think I found bug/issue with tf.nn.max_pool_with_argmax and the unpool workaround as presented here.

tf.nn.max_pool_with_argmax indices are calculated as (y * w + x) * channels + c, but the "w" the width of the input tensor, not the width of the input tensor + padding, if any padding (padding='SAME' and width of tensor being odd) is applied.

Using the unpool method, the width is calculated by dividing/modulo that output with input_shape[2] * ksize[2], with padding this will be 1 pixel bigger than the width that tf.nn.max_pool_with_argmax uses for its argmax output. So if a padding is applied, every row of the output image of the unpool() op will be slightly offset, leading to the image being slightly tilted.

I'm currently implementing SegNet, which has several unpool operations one after the other, each making the tilting worse if there was any padding for it, which is really noticeable when looking at the final output.

My workaround was to change the proposed unpool operation by simply adding an input-argument for the output shape as follows:

def unpool(updates, mask, ksize=[1, 2, 2, 1], output_shape=None, name=''):
    with tf.variable_scope(name):
        mask = tf.cast(mask, tf.int32)
        input_shape = tf.shape(updates, out_type=tf.int32)
        #  calculation new shape
        if output_shape is None:
            output_shape = (input_shape[0], input_shape[1] * ksize[1], input_shape[2] * ksize[2], input_shape[3])

        # calculation indices for batch, height, width and feature maps
        one_like_mask = tf.ones_like(mask, dtype=tf.int32)
        batch_shape = tf.concat([[input_shape[0]], [1], [1], [1]], 0)
        batch_range = tf.reshape(tf.range(output_shape[0], dtype=tf.int32), shape=batch_shape)
        b = one_like_mask * batch_range
        y = mask // (output_shape[2] * output_shape[3])
        x = (mask // output_shape[3]) % output_shape[2] #mask % (output_shape[2] * output_shape[3]) // output_shape[3]
        feature_range = tf.range(output_shape[3], dtype=tf.int32)
        f = one_like_mask * feature_range
        # transpose indices & reshape update values to one dimension
        updates_size = tf.size(updates)
        indices = tf.transpose(tf.reshape(tf.stack([b, y, x, f]), [4, updates_size]))
        values = tf.reshape(updates, [updates_size])
        ret = tf.scatter_nd(indices, values, output_shape)
        return ret

then when calling the op, I supply the shape of the convolution in the encoder part of segnet as output_shape, so the code will use the correct (well, incorrect...) width when transforming the tf.nn.max_pool_with_argmax indices.

Arguably, this is a bug with tf.nn.max_pool_with_argmax, since it should calculate the argmax indices by taking potential padding into account

@Panaetius Have you submitted an issue or a pull request for tf.nn.max_pool_with_argmax?

Your solution is a little bit dirty but seems to work perfectly.
Are there any plans to merge your proposal or should this be further discussed?

@PavlosMelissinos No I haven't submitted an issue or pull request, I wanted to see feedback on this here first, since I wasn't 100% sure if it's actually a bug.

Yes, the code is a little dirty, it was just a quick fix for a hobby project. But it's been working great for me so far. I think fixing the max_pool_with_argmax issue should be done before adding any unpool op, and the code would have to be sanitized as well.

I'll probably write a small self-contained example to show the issue with max_pool_with_argmax and post a bug report later today.

Great! Agree with you on all points. Looking forward to it.


def unpool(pool, ind, ksize=[1, 2, 2, 1], scope='unpool'):
    """
       Unpooling layer after max_pool_with_argmax.
       Args:
           updates:   max pooled output tensor
           mask:      argmax indices
           ksize:     ksize is the same as for the pool
       Return:
           unpool:    unpooling tensor
    """
    with tf.variable_scope(scope):
        input_shape = pool.get_shape().as_list()
        output_shape = (input_shape[0], input_shape[1] * ksize[1], input_shape[2] * ksize[2], input_shape[3])
        pool_ = tf.reshape(pool, [input_shape[0] * input_shape[1] * input_shape[2] * input_shape[3]])
        batch_range = tf.reshape(tf.range(output_shape[0], dtype=ind.dtype), shape=[input_shape[0], 1, 1, 1])
        b = tf.ones_like(ind) * batch_range
        b = tf.reshape(b, [input_shape[0] * input_shape[1] * input_shape[2] * input_shape[3], 1])
        ind_ = tf.reshape(ind, [input_shape[0] * input_shape[1] * input_shape[2] * input_shape[3], 1])
        ind_ = tf.concat(1, [b, ind_])
        ref = tf.Variable(tf.zeros([output_shape[0], output_shape[1] * output_shape[2] * output_shape[3]]))
        ret = tf.scatter_nd_update(ref, ind_, pool_)
        ret = tf.reshape(ret, [output_shape[0], output_shape[1], output_shape[2], output_shape[3]])
        return ret

I've adapted Pepslee's version to use tf.scatter_nd, instead of tf.scatter_nd_update to avoid creation of a Variable. The Variable was causing problems to checkpoint files because of fixed batch size, so if you run with a different batch size it wasn't able to read the checkpoint.

def unpool(pool, ind, ksize=[1, 2, 2, 1], scope='unpool'):
    """
       Unpooling layer after max_pool_with_argmax.
       Args:
           pool:   max pooled output tensor
           ind:      argmax indices
           ksize:     ksize is the same as for the pool
       Return:
           unpool:    unpooling tensor
    """
    with tf.variable_scope(scope):
        input_shape = pool.get_shape().as_list()
        output_shape = (input_shape[0], input_shape[1] * ksize[1], input_shape[2] * ksize[2], input_shape[3])

        flat_input_size = np.prod(input_shape)
        flat_output_shape = [output_shape[0], output_shape[1] * output_shape[2] * output_shape[3]]

        pool_ = tf.reshape(pool, [flat_input_size])
        batch_range = tf.reshape(tf.range(output_shape[0], dtype=ind.dtype), shape=[input_shape[0], 1, 1, 1])
        b = tf.ones_like(ind) * batch_range
        b = tf.reshape(b, [flat_input_size, 1])
        ind_ = tf.reshape(ind, [flat_input_size, 1])
        ind_ = tf.concat([b, ind_], 1)

        ret = tf.scatter_nd(ind_, pool_, shape=flat_output_shape)
        ret = tf.reshape(ret, output_shape)
        return ret

I've adapted chahld's version to handle unknown input tensor shape.

def unpool(pool, ind, ksize=[1, 2, 2, 1], scope='unpool'):
    """
       Unpooling layer after max_pool_with_argmax.
       Args:
           pool:   max pooled output tensor
           ind:      argmax indices
           ksize:     ksize is the same as for the pool
       Return:
           unpool:    unpooling tensor
    """
    with tf.variable_scope(scope):
        input_shape =  tf.shape(pool)
        output_shape = [input_shape[0], input_shape[1] * ksize[1], input_shape[2] * ksize[2], input_shape[3]]

        flat_input_size = tf.cumprod(input_shape)[-1]
        flat_output_shape = tf.stack([output_shape[0], output_shape[1] * output_shape[2] * output_shape[3]])

        pool_ = tf.reshape(pool, tf.stack([flat_input_size]))
        batch_range = tf.reshape(tf.range(tf.cast(output_shape[0], tf.int64), dtype=ind.dtype), 
                                          shape=tf.stack([input_shape[0], 1, 1, 1]))
        b = tf.ones_like(ind) * batch_range
        b = tf.reshape(b, tf.stack([flat_input_size, 1]))
        ind_ = tf.reshape(ind, tf.stack([flat_input_size, 1]))
        ind_ = tf.concat([b, ind_], 1)

        ret = tf.scatter_nd(ind_, pool_, shape=tf.cast(flat_output_shape, tf.int64))
        ret = tf.reshape(ret, tf.stack(output_shape))
        return ret

Do we have an alternative for MaxPoolWithArgmax on CPUs?

I am also looking to implement swwae. It looks like a lot of good work has been done to address the unpooling issues. Does anyone have a small example showing it all put together?

@isaacgerg, I have next example: https://github.com/yselivonchyk/Tensorflow_WhatWhereAutoencoder

It might use a bit older version of the code, but illustrates the pipeline. Note, that it might be different from official TF API released with the latest version.

@yselivonchyk Quick question. Figure 1 b of the paper shows 2 "where"s going to the decoder. It looks like you only have one. Is this correct?

@isaacgerg correct. I reproduced the experiment there which utilizes only a single layer.

If you want to, you can take a look at https://github.com/yselivonchyk/TensorFlow_DCIGN/blob/master/model_interpreter.py which can build a model, including SWWAE using syntax like in original paper '(16)5c-(32)3c-Xp'

@yselivonchyk Thanks. By inverse graphics, are you referring to the phrase Hinton uses to describe what he believes the brain does?

@isaacgerg, never thought of that. Original intention was to reproduce https://arxiv.org/abs/1503.03167 therefore the name. But it all adds up, since authors worked together with Hinton.
In the paper they feed a sequence of images showing a single transformation and trying to isolate info about transformation in the encoding space.

model_interpreter.py part, though, is supposed to be independent of that concept.

@yselivonchyk Yes, I believe this is similar to hinton's transforming autoencoders.

@isaacgerg I think we can use tf.gradients to realize this on CPUs. Firstly we do gradients of max-pooled results with respect to feature maps, which helps us find the locations of maximum values. Secondly, according to the locations, we can do up-pooling stuff, a.k.a re-construct something. This is a little tricky. In our case, we only have a maximum value for each feature map. SO, the "where it is" information is straightforward and we can use this to re-construct partial image.

@MilesZhao This makes sense. I am trying to translate this to tf code.

@isaacgerg I once emailed to the author of SWWAE. They realize it in Torch. But, I believe there is some example in keras.

@MilesZhao The keras example will only run in theano.

I was able to get the keras example to run in tensorflow 1.2.1 with minimial changes.

@isaacgerg can you share your tensorflow example?

@manglav https://github.com/isaacgerg/keras_odds_and_ends

Please message me if you find errors.

Hi! @ThomasWollmann ,I have a problem. Please see tensorflow/tensorflow#8102, scatter_nd has duplication problem. I think this is not work out for Zeiler Unpooling layer. But I want to use unknown input tensor shape. Is there a way to solve this problem?
Thanks!

@teramototoya I recognized this issue as well in my experiments. However, I don't have a solution yet.

@ThomasWollmann Hey, If I'm not wrong, I see that your code only works for stride size of 1 right?

Small improvement of ThomasWollmann code to add the known shape to the output tensor (also removed the tf.stack that were not needed).
Useful when using tf.contrib.layers.conv2d.

def unpool(pool, ind, ksize=[1, 2, 2, 1], scope='unpool'):
    """
       Unpooling layer after max_pool_with_argmax.
       Args:
           pool:   max pooled output tensor
           ind:      argmax indices
           ksize:     ksize is the same as for the pool
       Return:
           unpool:    unpooling tensor
    """
    with tf.variable_scope(scope):
        input_shape = tf.shape(pool)
        output_shape = [input_shape[0], input_shape[1] * ksize[1], input_shape[2] * ksize[2], input_shape[3]]

        flat_input_size = tf.reduce_prod(input_shape)
        flat_output_shape = [output_shape[0], output_shape[1] * output_shape[2] * output_shape[3]]

        pool_ = tf.reshape(pool, [flat_input_size])
        batch_range = tf.reshape(tf.range(tf.cast(output_shape[0], tf.int64), dtype=ind.dtype), 
                                          shape=[input_shape[0], 1, 1, 1])
        b = tf.ones_like(ind) * batch_range
        b1 = tf.reshape(b, [flat_input_size, 1])
        ind_ = tf.reshape(ind, [flat_input_size, 1])
        ind_ = tf.concat([b1, ind_], 1)

        ret = tf.scatter_nd(ind_, pool_, shape=tf.cast(flat_output_shape, tf.int64))
        ret = tf.reshape(ret, output_shape)

        set_input_shape = pool.get_shape()
        set_output_shape = [set_input_shape[0], set_input_shape[1] * ksize[1], set_input_shape[2] * ksize[2], set_input_shape[3]]
        ret.set_shape(set_output_shape)
        return ret

Hello,

So the bottom line is: some of the operations are not implemented in GPU which causes Unpool to be slow in Tensorflow. Is that correct?

I've tried to run some benchmarks using Tensorflow's default benchmarks and for the previous version (from May 13), the top offender is FloorDiv. I've changed to the above version (@rayanelleuch's) and the problem is now with ConcatV2 function.

new implementation, tf.one_hot has GPU implementation, but i check only forward computation, and i`m not sure, that backward gradient is implemented for this operation

def unpool(pool, ind, ksize=(1, 2, 2, 1), scope='unpool'):
    """
       Unpooling layer after max_pool_with_argmax.
       Args:
           pool:   max pooled output tensor
           ind:      argmax indices (produced by tf.nn.max_pool_with_argmax)
           ksize:     ksize is the same as for the pool
       Return:
           unpooled:    unpooling tensor
    """
    with tf.variable_scope(scope):
        pooled_shape = pool.get_shape().as_list()

        flatten_ind = tf.reshape(ind, (pooled_shape[0], pooled_shape[1] * pooled_shape[2] * pooled_shape[3]))
        # sparse indices to dense ones_like matrics
        one_hot_ind = tf.one_hot(flatten_ind,  pooled_shape[1] * ksize[1] * pooled_shape[2] * ksize[2] * pooled_shape[3], on_value=1., off_value=0., axis=-1)
        one_hot_ind = tf.reduce_sum(one_hot_ind, axis=1)
        one_like_mask = tf.reshape(one_hot_ind, (pooled_shape[0], pooled_shape[1] * ksize[1], pooled_shape[2] * ksize[2], pooled_shape[3]))
        # resize input array to the output size by nearest neighbor
        img = tf.image.resize_nearest_neighbor(pool, [pooled_shape[1] * ksize[1], pooled_shape[2] * ksize[2]])
        unpooled = tf.multiply(img, tf.cast(one_like_mask, img.dtype))
        return unpooled

Hello @Pepslee, thanks for sharing the code.

I was doing some benchmarks with previous code and it was working fine, but now I'm facing out of memory issues in tf.one_hot function. It is trying to allocate a tensor of shape [153600,614400]. The pooled_shape is [None, 60, 40, 64], so 60x40x64 = 153600 and 60x40x64x2x2 = 614400 (2x2 comes from the ksize). Correct me if I'm wrong, but a tensor of this size couldn't fit in memory.

The weird part for me is that a version based on @rayanelleuch's code doesn't give me any issues. If we compute the size of this Tensor it would be something like (153600x614400x4)/1024/1024/1024, resulting in around 351.56GB. Another important detail is that I'm using Keras (not sure whether it changes something regarding this part).

Is there anything I'm missing here? Please correct if I'm wrong.

Here is part of the log:

2017-10-27 18:28:47.630711: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[153600,614400]
2017-10-27 18:28:47.631246: E tensorflow/tools/benchmark/benchmark_model.cc:256] Error during inference: Resource exhausted: OOM when allocating tensor with shape[153600,614400]
         [[Node: max_unpool2d_new_1/max_unpool2d_new_1/one_hot = OneHot[T=DT_FLOAT, TI=DT_INT32, axis=-1, _device="/job:localhost/replica:0/task:0/gpu:0"](max_unpool2d_new_1/max_unpool2d_new_1/Reshape/_9, max_unpool2d_new_1/max_unpool2d_new_1/mul_3, max_unpool2d_new_1/max_unpool2d_new_1/one_hot/on_value, max_unpool2d_new_1/max_unpool2d_new_1/one_hot/off_value)]]
         [[Node: logits/BiasAdd/_19 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_3407_logits/BiasAdd", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
2017-10-27 18:28:47.631445: I tensorflow/tools/benchmark/benchmark_model.cc:291] Failed on run 0
2017-10-27 18:28:47.631572: E tensorflow/tools/benchmark/benchmark_model.cc:474] Timing failed with Resource exhausted: OOM when allocating tensor with shape[153600,614400]

Hi, yes, one_hot function is build one_hot_vector for each index. So, I`m trying to build matrix, which will have ones at the required indices, and then multiply it by the interpolated(nearest neighbor) pooled data.

[[2, 5], [1, 4],
 [6, 1], [3, 7]]  -> maxpoo -> [[5, 4],
                                [6, 7]]   ->   ind = [1, 3, 4, 7] -> one_hot ->

-> [0, 1, 0, 0, 0, 0, 0, 0]
   [0, 0, 0, 1, 0, 0, 0, 0]
   [0, 0, 0, 0, 1, 0, 0, 0]
   [0, 0, 0, 0, 0, 0, 0, 1]   ->  merge(summ by vertical axis)  ->  [0, 1, 0, 1, 1, 0, 0, 1]

then reshape to  [[0, 1, 0, 1], 
                  [1, 0, 0, 1]]

and then multiply by interpolated (nearest neighbor) data from maxpool layer

                [[5, 5], [4, 4],
                 [6, 6], [7, 7]]   

i didn`t find some better function, to get ones_like index matrix.
There are tf.sparse_to_dense function in tensorflow, but it implemented only for CPU.

Maybe someone will try to optimize this method of getting ones_like index matrix.

Hi, @teramototoya
but tf.scatter_nd have not implementation on GPU

@Pepslee Thanks for your comment. Oh sorry. I did not know that. So I will delete the my comment.

@tombstone Is there any plan by the G-RMI team?

@jch1 I heard something about a TF Segmentation API at ICCV. Have you any details to share?

@Pepslee my understanding of your code is that tf.reduce_sum is not implemented for GPU so this unpooling will still be CPU-bound. Is that correct?

Also, as of TensorFlow 1.4 tf.scatter_nd appears to have GPU support.

if I do not use tf.session to run, the unpool can run normaly. Howerver, I use tf.session, the problem that ValueError: None values not supported was occur. Whether the unpool.m was not adapt to Back Propagation

I have implemented the unpool code in tensorflow https://github.com/sicongliu92/DL-code/blob/master/unpool(1).py , this can be used for layer definition for training without of session running to transfer Tensor into numpy Array

`for batch in range(pool_shape[0]):
for channel in range(pool_shape[1]):
for w in range(pool_shape[2]):
for h in range(pool_shape[3]):
unpool = set_value(unpool,[batch,channel,wstride], hstride, pool[batch][channel][w][h])

`
The full code please see: https://github.com/sicongliu92/DL-code/blob/master/unpool(1).py

Closing this issue. One of the many versions here should probably be a pull request :-)

tensorflow/tensorflow#16885

@alextp Maybe open the issue again? I personally wouldn't consider it resolved until either it was decided that unpooling wasn't going to be included in the API, or that a pull request was merged. I'd prefer it to stay open so it isn't overlooked.

FWIW, the version in https://github.com/tensorflow/tensorflow/pull/16885 worked beautifully for me. Thanks @rayanelleuch!

@chrisranderson Hi, I'am newer in Deep Learning and Tensorflow. Would you please instruct me more detail about how to implement unpooling by tensorflow/tensorflow#16885? Thanks a lot!

@XXY0118 Copy and paste these lines https://github.com/rayanelleuch/tensorflow/blob/b46d50583d8f4893f1b1d629d0ac9cb2cff580af/tensorflow/contrib/layers/python/layers/layers.py#L2291-L2327, and you should be good to go. I wish GitHub allowed some kind of DM for occasions like this.

@daeyun , please swap parameters in tf.concat call from:
out = tf.concat(i, [out, tf.zeros_like(out)])
to:
out = tf.concat([out, tf.zeros_like(out)], i)

Other than that works fine for unpooling without positions indices. Thanks!

Is there a strided version of unpool function?

We are going to close this issue. Feel free to reopen it if you want to contribute and link the PR to it.

A differentiable and GPU-safe avg_unpool2d implementation is as follows:

def avg_unpool2d(x, factor):
  '''
  Performs "average un-pooling", i.e. nearest neighbor upsampling,
  without the faulty `tf.image.resize_nearest_neighbor` op.
  ''' 
  x = tf.transpose(x, [1, 2, 3, 0])
  x = tf.expand_dims(x, 0)
  x = tf.tile(x, [factor**2, 1, 1, 1, 1])
  x = tf.batch_to_space_nd(x, [factor, factor], [[0, 0], [0, 0]])
  x = tf.transpose(x[0], [3, 0, 1, 2])
  return x

I believe that a max_unpool/avg_unpool function would be quite useful. The argument that we should "just use the gradient op" ignores the fact that this makes our code hacky and opaque. Also, there's no official documentation for this approach.

TensorFlow doesn't ask people to implement deconvolution, even though technically it can be expressed as a convolution. Why? It's convenient and it lets researchers focus on more important things. The same goes for unpooling.

@greydanus @jkyl I'd love to approve a PR adding this max_unpool implementation to tf and a unit test.

I'm working on a PR + unit test. More to come.

Reopening so @graydanus's PR can close it

I revisited the implementations in current thread and found that @rayanelleuch solution from Oct 24, 2017 works the best for me. It works with batches (i.e. first dimension of the input tensor is None), produces known output shape and produces no type errors.

I also added tf.keras layers for MaxPoolingWithArgmax and Unpooling (previously mentioned versions did not work for tf.keras but worked with just keras, somehow) here https://github.com/yselivonchyk/Tensorflow_WhatWhereAutoencoder/blob/master/pooling.py

Hello everybody!

As @Panaetius highlighted it, the Unpooling layers presented here have a drawback. They don't account for the padding due to the fact that tf.nn.max_pool_with_argmax does not return a tensor whose size contains the padding (if you use padding='SAME' for example). If have change the function so that it unpools the tensor to the size of the prev_tensor that we use during the tf.nn.max_pool_with_argmax:

def max_unpool(pool, ind, prev_tensor, scope='unpool_2d'):
    """
    Implement the unpooling operation, as explained here:
    https://stackoverflow.com/questions/36548736/tensorflow-unpooling

    Args:
        pool (tensor): Input tensor of shape (N, H, W, C)
        ind (tensor): Input tensor of shape (N, H, W, C) containing the maximum
            flatten indices (see https://www.tensorflow.org/api_docs/python/tf.nn.max_pool_with_argmax)
        prev_tensor (tensor): previous tensor shape
        scope (str): scope in which to register the operations
    Return:
        ret (tensor): tensor same shape as prev_tensor that corresponds to the "invert" of the
            max pooling operation
    """
    with tf.variable_scope(scope):
        # input_shape = [N, H, W, C]
        input_shape = tf.shape(pool)
        o_shape = tf.shape(prev_tensor)

        output_shape = [input_shape[0], o_shape[1], o_shape[2], input_shape[3]]

        # N * H * W * C
        flat_input_size = tf.reduce_prod(input_shape)

        # flat output_shape = [N, 4 * H * W * C]
        flat_output_shape = [output_shape[0], output_shape[1] * output_shape[2] * output_shape[3]]

        updates = tf.reshape(pool, [flat_input_size])

        # create the tensor [ [[[1]]], [[[0]]], ..., [[[N-1]]] ]
        batch_range = tf.reshape(
            tf.range(tf.cast(output_shape[0], tf.int64), dtype=ind.dtype),
            shape=[input_shape[0], 1, 1, 1])

        # b is a tensor of size (N, H, W, C) whose first element of the batch are 3D-array full of 0
        # second element of the batch are 3D-array full of 1, ...   
        b = tf.ones_like(ind) * batch_range
        b = tf.reshape(b, [flat_input_size, 1])

        # indices = [ [0, ind_1], [0, ind_2], ... [0, ind_k], ..., [N-1, ind_{N*H*W*C}], [N-1, ind_{N*H*W*C-1}] ]
        indices = tf.reshape(ind, [flat_input_size, 1])
        indices = tf.concat([b, indices], axis=-1)

        ret = tf.scatter_nd(indices, updates, shape=tf.cast(flat_output_shape, tf.int64))
        ret = tf.reshape(ret, output_shape)

        set_input_shape = pool.get_shape()
        prev_tensor_shape = prev_tensor.get_shape()

        set_output_shape = [set_input_shape[0], prev_tensor_shape[1], prev_tensor_shape[2], set_input_shape[3]]
        ret.set_shape(set_output_shape)

        return ret

You can use it as follow:

maxpool_layer, maxpool_idx = tf.nn.max_pool_with_argmax(
            your_input,
            [1, 2, 2, 1], [1, 2, 2, 1],
            padding='SAME',
            name="max_pooling_5")

        conv_layer = tf.layers.conv2d(
            maxpool_layer,
            filters=4096,
            kernel_size=7,
            name='conv')

        deconv_layer = tf.layers.conv2d(
            conv_layer,
            filters=512,
            kernel_size=1,
            kernel_initializer=tf.contrib.layers.xavier_initializer(),
            name="deconv")
        unpooling_layer5 = max_unpool(deconv_layer, maxpool_idx5, your_input, scope="Unpooling_5")

This implemention works well with padding and doesn't need to use set_shape() while reading the tf_records, which means that during the prediction time you can pass one single image at a time (batch_size=1) and have image of totally different sizes and it won't break.

This sounds like a great feature! Adding support for unpooling is outside of the scope of TensorFlow Core, but would be a fantastic addition to TensorFlow Addons.

Transferring this issue now; @seanpmorgan for visibility.

Thanks for transferring. This seems like a nice fit in addons, though it will need to be converted to fit the Keras Layer API and have appropriate test cases.

Was this page helpful?
0 / 5 - 0 ratings