The paper indicates that training was done with batch size = 1
Is there a reason not to use a slightly larger batch size to more fully occupy the GPUs? For example, are the results better with batch size = 1 than with batch sizes larger than 1?
We haven't compared the quality of results with different batch sizes. It would be great if someone can look at it. We use batchSize=1 mainly because we would like to train a model on images with higher resolution.
Pix2pix training on convergence curves on Facades dataset using batch sizes of 1, 16, 32, respectively.



Thanks! Also note that batchsize=1 is instance norm (aka contrast normalization), which has qualitatively different properties from batchnorm. Batchnorm achieves invariance to mean and variance of features across a bunch of images. Instance norm achieves invariance to mean and variance of features in a single image. As a result, instance norm will be (nearly) invariant to image-level operations like changing the exposure or contrast of a photo, whereas batchnorm will not. Batchnorm is only invariant to batch-level operations.
*caveat, these statements are only strictly true if the momentum parameter is set to zero, which we don't do in practice
Most helpful comment
Pix2pix training on convergence curves on Facades dataset using batch sizes of 1, 16, 32, respectively.