Pooling and convolution both aggregate inputs over rectangular regions defined by kernel size, stride, and padding parameters. (In fact, average pooling is convolution with a fixed filter, and max pooling is a special case of "tropical convolution".)
However, Caffe currently uses different sets of regions (and consequently, produces different output sizes) for pooling and convolution. Convolution regions are never allowed outside of the padded input, while pooling is performed when a _strided_ pooling region extends off the ends of the input.
This inconsistency causes some annoyance when the exact sizes of things need to be computed and can vary (due to to #594). (The issue doesn't normally come up when using networks like AlexNet that ensure that their pooling regions don't encounter this edge case.)
Should the behavior be made consistent, or is there a good reason for its current state?
(See also #988, but note that the expression presented there is different from both the current behaviors.)
If I understand it correctly, currently for pooling it is top_size = ceil((bottom_size + 2*pad - kernel_size) / stride) + 1 and for conv it is top_size = floor((bottom_size + 2*pad - kernel_size) / stride) + 1
I also have this issue with my fully-convolutional model. I think it should be made consistent.
Yeah, I've run into problems with this as well -- there should be a RectangularTileParameter or some such that both ConvolutionParameter and PoolingParameter understand.
while pooling is performed when a strided pooling region extends off the ends of the input.
Hmm, I don't think I understand; are you saying the behavior of PoolingLayer::Reshape (https://github.com/BVLC/caffe/blob/dev/src/caffe/layers/pooling_layer.cpp#L81) is different for the case of stride_h_ == 1 vs stride_h_ > 1 (and/or for stride_w_)? If so that's not obvious to me from the code, but maybe it's more subtle than an if (stride > 1) kind of statement.
I vote yes on standardizing the regions for convolution and pooling. This was raised earlier in https://github.com/BVLC/caffe/pull/473#issuecomment-45386156 and although @jeffdonahue suggested the "cover" mode be kept for compatibility at the time I'd like to hear if he's still into it.

Above all, I raise my glass to tropical pools.
@jeffdonahue, all I meant was that in the stride 1 case, the behavior of pooling and convolution is the same; the pooling and convolution regions will always cover the entire input without padding.
@shelhamer and I have discussed, and concluded that it's reasonable to change the behavior of pooling to match that of convolution. In particular:
If a reason for keeping the "cover" mode still exists, comment here. Otherwise I'll implement the change eventually... others are welcome to implement it soon if desired.
I don't particularly care about having the "cover" mode in terms of wanting to use it for future training, but I'd think it would cause backwards compatibility issues to change the default pooling behavior -- e.g. if you have a saved net with a pooling layer and its output is fed into an inner product layer, and now the pooling layer has a different output dimension, the inner product layer's weights will have a different input dimension and the weights would thus be incompatible. And even if you go fully convolutional it might cause undesired effects on the sampling behavior if the dimensions were carefully planned (e.g. to have exactly 1x1 outputs from some pooling layer)?
Maybe we should keep the current behavior for the current set of proto parameters specifying the tiling behavior and deprecate them. Then add a new common proto message with the new behavior used by convolution and pooling.
@jeffdonahue, yeah, that's a good point. I doubt many (any?) are relying on edge-pooling in networks that include inner product layers, but it's desirable to have a common message anyway. So your suggestion makes sense as a nice way to avoid some potential backwards compatibility issues, while getting the new behavior _and_ a common message that can simplify code at the same time.
The padding stuff is also one of the trickiest parts of #560. It is mixing the padding logic with the others that caused so many confusions. Padding is more likely a member of the data transformation family, rather than an inherent part of any layer.
The padding of MATLAB conv2 has three modes: full, same and valid. All of them have some use cases. When an independent class is created to manage the padding, it is not very hard to support any mode.
As a side note, the sizes compatibility validations of the trained models and the network definitions are mixed with many other things in the layer setup methods currently. It would be much easier to reason and maintain by separating them out.
@futurely, it might be nice to factor out padding (thus allowing it to be used by itself, even when not followed by a conv or pooling layer), but I don't see a way to do that with a non-negligible cost. The data transformation stuff works because the data is first read into host memory, and only the transformed version goes to GPU. For an intermediate layer, one can't pad in-place, so a naive approach has a memory cost (and I don't see a less naive approach).
class InPlacePadding {
public:
void pad(const PaddingParameter& param, const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top);
protected:
void pad_mode1(vector<Blob<Dtype>*>* top);
void pad_mode2(vector<Blob<Dtype>*>* top);
void pad_mode3(vector<Blob<Dtype>*>* top);
}
class PoolingLayer {
public:
void Method(const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) {
...
padding_.pad(this->layer_param_.padding_param(), bottom, top);
...
}
protected:
InPlacePadding padding_;
}
Using composition instead of adding a layer above the conv or pooling layer, no extra memory has to be allocated.
@futurely I can't tell what you're trying to get at above... what is padding_.pad supposed to do? Presumably it does puts the padded version of its bottom in top, but then one has to allocate extra memory for the padded version (note that PoolingLayer can't use its _own_ top to the store the padded bottom, it's too small). But your claim is that no extra memory has to be allocated, so one of us is confused.
There _is_ a way to abstract away the padding, which is to have an indexing routine which indirects from direct access to blob memory, performing bounds checking and sometimes returning zero rather than actual blob contents. But I worry that would be overengineering.
The above sketch tried to share padding codes without using inheritance. I thought the Blob
Thanks for details. Wired dimension of outputs of pooling is confusing.
Hey guys, it has been a very long time since the issue was opened.
When can we solve this problem?
Most helpful comment
Hey guys, it has been a very long time since the issue was opened.
When can we solve this problem?