Hi. I've been trying to reimplement CycleGAN architecture without taking a look at any code (torch, PyTorch, Tensorflow etc.). What I'm gonna ask is; I couldn't find any reference for the stride of discriminator's last layer.
In the file pix2pix/scripts/receptive_field_sizes.m, you have,
% fix the output size to 1 and derive the receptive field in the input
out = ...
f(f(f(f(f(1, 4, 1), ... % conv4 -> conv5
4, 1), ... % conv3 -> conv4
4, 2), ... % conv2 -> conv3
4, 2), ... % conv1 -> conv2
4, 2); % input -> conv1
fprintf('n=3 discriminator receptive field size: %d\n', out);
It says the last TWO layers have stride = 1.
Also in the file pytorch-CycleGAN-and-pix2pix/models/networks.py, starting from line 413, you have,
nf_mult_prev = nf_mult
nf_mult = min(2**n_layers, 8)
sequence += [
nn.Conv2d(ndf * nf_mult_prev, ndf * nf_mult,
kernel_size=kw, stride=1, padding=padw, bias=use_bias),
norm_layer(ndf * nf_mult),
nn.LeakyReLU(0.2, True)
]
sequence += [nn.Conv2d(ndf * nf_mult, 1, kernel_size=kw, stride=1, padding=padw)]
Both the script and code suggest that discriminator's last convolutional layer (i.e. 256, 512) AND the layer after that, namely, "mapper convolutional layer" have their strides set to 1.
I couldn't find any reference to this operation in both ConditionalGAN and CycleGAN papers, other than
After the last layer, a convolution is applied to map to a 1 dimensional output, followed by a Sigmoid function.
which only tells me that after the 'c512' layer, I have to add a convolutional layer to map features to 1 dimensional output. Since I haven't had a look on any code until now, it has been overwhelming for me to understand the architecture. Am I missing something in the paper ?
Thank you for your time.
Hi @onursertkaya,
Sorry it's been hard to follow. You are right that the last two layers both have stride 1. Reading over the appendix of the pix2pix paper it looks like we indeed failed to mention this. I'll update it in the next arxiv draft.
As you continue working on your reimplementation, I would suggest looking at our code, rather than trying to reimplement directly from the papers. There are probably going to be more details that are in the code but not mentioned in the paper (although we tried to minimize this). My own perspective is that the "scientific publication" should not be thought of as just the paper, but the paper+code+data. For learning about the basic idea and math, the paper is the place to look. For reimplementing the exact method, I would say the code is the primary place to look.
@phillipi,
It looks like you have forgotten to fix it, because I could not find a fix in the v3.
Most helpful comment
Hi @onursertkaya,
Sorry it's been hard to follow. You are right that the last two layers both have stride 1. Reading over the appendix of the pix2pix paper it looks like we indeed failed to mention this. I'll update it in the next arxiv draft.
As you continue working on your reimplementation, I would suggest looking at our code, rather than trying to reimplement directly from the papers. There are probably going to be more details that are in the code but not mentioned in the paper (although we tried to minimize this). My own perspective is that the "scientific publication" should not be thought of as just the paper, but the paper+code+data. For learning about the basic idea and math, the paper is the place to look. For reimplementing the exact method, I would say the code is the primary place to look.