Pytorch-cyclegan-and-pix2pix: Tanh in the Generator last activation

Created on 10 Feb 2019  路  15Comments  路  Source: junyanz/pytorch-CycleGAN-and-pix2pix

Thanks for your work.
I am wondering why are you using tanh in the last activation of the generator?
Thnks again

Most helpful comment

Yes, your understanding is correct.

All 15 comments

The goal is to match the range. The range of real images is [-1, 1]. Tanh outputs a value between [-1, 1].

Clear!

If I normalize the whole image [HxW] to [-1,1] and then random crop to size of [H/8xW/8] and feed to the network. Clear that the range of [H/8xW/8] will not in the range [-1,1]. Should not use the tanh in the last layer? Which way do you prefer to handle it? I cannot feed the whole [HxW] due to the memory issue

[-1, 1] is the range of the value each pixel (brightness / color of each pixel should be within -1 and 1), so it has nothing to do with the width and height of the image.

@taesungp : No, I misunderstood my question. Let's I is an image with size of HxW. So the normalization will be

I=I/max(I)
I=(I-0.5)/0.5

Now, the image intensity will be in [-1,1]. If I randomly crop the image into [H/8 and W/8]. Do you think the crop image range still in [-1,1]. No. It will be in a different range.

In the first line you should do

I = I/255.0 instead of I = I/max(I) so that it become independent of the values of the current cropped I.

Yes. But after normalization, we will crop the image. I know that we should normalize after the crop image but in my case, I want to normalize before crop image.

I think I = I/255.0 is independent of cropping. Cropping and then I/255.0 is same as doing I/255.0 and then cropping.

It is correct. But the problem here is that if an image size of WxH is normalized to [-1,1]. Then crop a region in the image, the region may not in range of [-1,1], it may be [-0.5 0.5]. Then the output of tanh is [-1,1], so it makes the inconsistent range between cropped input and output of the network.

  • Even with tanh, if the ground-truth cropped image is in the range of [-.5, .5], the generator network will learn to output [-.5, .5]. In other words, tanh does not make all outputs to have max value 1. For example, if the generator outputs zero everywhere, the image will be also zero everywhere, not [-1, -1].
  • You actually have exactly same situation with uncropped images. Some images are bright, so they will be in [0, 1] range, not [-1, 1]. Some images are greyish, so they will be within [-0.5, 0.5]. You have the same amount of problem with or without cropping.
  • Tanh merely constrains the minimum and maximum output of the generator to be -1 and 1. The network can probably do just as well with .clamp(-1, 1) instead of Tanh().

I gave an example for that

import numpy as np

I = [[1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255],
     [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255],
     [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255],
     [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255],
     [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255]]
I = np.asarray(I)
I = I/255
I= (I-0.5)/0.5
print (I.min(), I.max()) #-0.9921568627450981 1.0
I_crop= I[4:6, 4:6]
print(I_crop.min(),I_crop.max()) #-0.9607843137254902 -0.8745098039215686

Yes...? The cropped image can be just thought as a smaller uncropped image. You are just training with smaller images.

Let's say you don't use cropping. What if the input image is

I = [[1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,19], [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,19], [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,19], [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,19], [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18, 19]] + 100

so that all values are within [101, 119]? As such, cropping does not introduce any extra problem. If images are within range [-0.5, 0.5], the generator will learn to output [-arctanh(-0.5), arctanh(0.5)].

The goal is to match the range. The range of real images is [-1, 1]. Tanh outputs a value between [-1, 1].

Yes, I have heard that the range of real images is [-1, 1] elsewhere as well
However, I have two sequential questions.

  1. When I open an image using PIL, so PIL.Image('some_img.jpg')
    Does PIL automatically convert the pixel values ranging from [-1, 1] to [0, 255]? Or did you mean something different when saying the range of real images is [-1, 1]? I guess I'm not totally sure if the actual pixel numerical values actually range from [-1, 1] originally due to my misunderstanding.

What I do know is that torchvision.transforms.ToTensor divides the values ranging from [0, 255] by 255, thus scaling them to [0, 1]

  1. It was a bit odd to me that we usually first shift an image(if PIL does what question 1 says it does) to [0, 1] as original input AND THEN work with trying to output something from [-1, 1] again, then plot by shifting back to [0, 1].
    Where I thought it might be better to just take as input the original values in between [-1, 1] and then output something from [-1, 1], then plot by shifting back to [0, 1].

But it's been my belief that this actually didn't matter too much because of the normalization layers. The normalization makes the activations have a mean of 0 and a std of 1, so it doesn't matter what range the original input is in, even though its been shifted to [0, 1]. Is that a bad statement or a bad conceived notion? What are your thoughts on that?

  1. The PIL image is [0, 255]. We convert it into [-1, 1] using torchvision.transforms.Normalize, as neural networks work better with zero-mean data. The input to the networks is [-1, 1] after this conversion. See this line for more details.
  2. The range for both the original images and generated images is [-1, 1].
  1. The PIL image is [0, 255]. We convert it into [-1, 1] using torchvision.transforms.Normalize, as neural networks work better with zero-mean data. The input to the networks is [-1, 1] after this conversion. See this line for more details.
  2. The range for both the original images and generated images is [-1, 1].

Ah, so that is what you meant! I was worried this whole time, images originally contained values ranging from [-1,1] instead of what I been telling people(i.e [0,255])

Also yes, that would do that. I got so used to using different precomputed means and stds which doesn't give [-1, 1] for new data, that I forgot you were using .5 for all.

Does this mean that though, instead of input ranging from [0,1] and output ranging from [0,1] through Sigmoid, its better to normalize the [0,1] input to a [-1,1] input and output a [-1,1] output through Tanh, since the latter is normalized?

Yes, your understanding is correct.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

John1231983 picture John1231983  路  3Comments

lyhangustc picture lyhangustc  路  5Comments

wjx2 picture wjx2  路  3Comments

davidwessman picture davidwessman  路  3Comments

ShaniGam picture ShaniGam  路  4Comments