Pytorch-cyclegan-and-pix2pix: Tanh in the Generator last activation

Created on 10 Feb 2019 · 15Comments · Source: junyanz/pytorch-CycleGAN-and-pix2pix

Thanks for your work.
I am wondering why are you using tanh in the last activation of the generator?
Thnks again

Source

YuvalFrommer

👍1

Most helpful comment

Yes, your understanding is correct.

junyanz on 15 Dec 2019

❤2

All 15 comments

The goal is to match the range. The range of real images is [-1, 1]. Tanh outputs a value between [-1, 1].

junyanz on 10 Feb 2019

❤1 👍1

Clear!

If I normalize the whole image [HxW] to [-1,1] and then random crop to size of [H/8xW/8] and feed to the network. Clear that the range of [H/8xW/8] will not in the range [-1,1]. Should not use the tanh in the last layer? Which way do you prefer to handle it? I cannot feed the whole [HxW] due to the memory issue

John1231983 on 22 Feb 2019

[-1, 1] is the range of the value each pixel (brightness / color of each pixel should be within -1 and 1), so it has nothing to do with the width and height of the image.

taesungp on 23 Feb 2019

@taesungp : No, I misunderstood my question. Let's I is an image with size of HxW. So the normalization will be

I=I/max(I)
I=(I-0.5)/0.5

Now, the image intensity will be in [-1,1]. If I randomly crop the image into [H/8 and W/8]. Do you think the crop image range still in [-1,1]. No. It will be in a different range.

John1231983 on 23 Feb 2019

In the first line you should do

I = I/255.0 instead of I = I/max(I) so that it become independent of the values of the current cropped I.

taesungp on 23 Feb 2019

Yes. But after normalization, we will crop the image. I know that we should normalize after the crop image but in my case, I want to normalize before crop image.

John1231983 on 23 Feb 2019

I think I = I/255.0 is independent of cropping. Cropping and then I/255.0 is same as doing I/255.0 and then cropping.

taesungp on 23 Feb 2019

👍1

It is correct. But the problem here is that if an image size of WxH is normalized to [-1,1]. Then crop a region in the image, the region may not in range of [-1,1], it may be [-0.5 0.5]. Then the output of tanh is [-1,1], so it makes the inconsistent range between cropped input and output of the network.

John1231983 on 23 Feb 2019

Even with tanh, if the ground-truth cropped image is in the range of [-.5, .5], the generator network will learn to output [-.5, .5]. In other words, tanh does not make all outputs to have max value 1. For example, if the generator outputs zero everywhere, the image will be also zero everywhere, not [-1, -1].
You actually have exactly same situation with uncropped images. Some images are bright, so they will be in [0, 1] range, not [-1, 1]. Some images are greyish, so they will be within [-0.5, 0.5]. You have the same amount of problem with or without cropping.
Tanh merely constrains the minimum and maximum output of the generator to be -1 and 1. The network can probably do just as well with .clamp(-1, 1) instead of Tanh().

taesungp on 23 Feb 2019

I gave an example for that

import numpy as np

I = [[1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255],
     [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255],
     [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255],
     [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255],
     [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255]]
I = np.asarray(I)
I = I/255
I= (I-0.5)/0.5
print (I.min(), I.max()) #-0.9921568627450981 1.0
I_crop= I[4:6, 4:6]
print(I_crop.min(),I_crop.max()) #-0.9607843137254902 -0.8745098039215686

John1231983 on 23 Feb 2019

Yes...? The cropped image can be just thought as a smaller uncropped image. You are just training with smaller images.

Let's say you don't use cropping. What if the input image is

I = [[1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,19], [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,19], [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,19], [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,19], [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18, 19]] + 100

so that all values are within [101, 119]? As such, cropping does not introduce any extra problem. If images are within range [-0.5, 0.5], the generator will learn to output [-arctanh(-0.5), arctanh(0.5)].

taesungp on 23 Feb 2019

❤1

The goal is to match the range. The range of real images is [-1, 1]. Tanh outputs a value between [-1, 1].

Yes, I have heard that the range of real images is [-1, 1] elsewhere as well
However, I have two sequential questions.

When I open an image using PIL, so PIL.Image('some_img.jpg')
Does PIL automatically convert the pixel values ranging from [-1, 1] to [0, 255]? Or did you mean something different when saying the range of real images is [-1, 1]? I guess I'm not totally sure if the actual pixel numerical values actually range from [-1, 1] originally due to my misunderstanding.

What I do know is that torchvision.transforms.ToTensor divides the values ranging from [0, 255] by 255, thus scaling them to [0, 1]

It was a bit odd to me that we usually first shift an image(if PIL does what question 1 says it does) to [0, 1] as original input AND THEN work with trying to output something from [-1, 1] again, then plot by shifting back to [0, 1].
Where I thought it might be better to just take as input the original values in between [-1, 1] and then output something from [-1, 1], then plot by shifting back to [0, 1].

But it's been my belief that this actually didn't matter too much because of the normalization layers. The normalization makes the activations have a mean of 0 and a std of 1, so it doesn't matter what range the original input is in, even though its been shifted to [0, 1]. Is that a bad statement or a bad conceived notion? What are your thoughts on that?

MLSlayer on 13 Dec 2019

The PIL image is [0, 255]. We convert it into [-1, 1] using torchvision.transforms.Normalize, as neural networks work better with zero-mean data. The input to the networks is [-1, 1] after this conversion. See this line for more details.
The range for both the original images and generated images is [-1, 1].

junyanz on 14 Dec 2019

The PIL image is [0, 255]. We convert it into [-1, 1] using torchvision.transforms.Normalize, as neural networks work better with zero-mean data. The input to the networks is [-1, 1] after this conversion. See this line for more details.

The range for both the original images and generated images is [-1, 1].

Ah, so that is what you meant! I was worried this whole time, images originally contained values ranging from [-1,1] instead of what I been telling people(i.e [0,255])

Also yes, that would do that. I got so used to using different precomputed means and stds which doesn't give [-1, 1] for new data, that I forgot you were using .5 for all.

Does this mean that though, instead of input ranging from [0,1] and output ranging from [0,1] through Sigmoid, its better to normalize the [0,1] input to a [-1,1] input and output a [-1,1] output through Tanh, since the latter is normalized?

MLSlayer on 15 Dec 2019

Yes, your understanding is correct.

junyanz on 15 Dec 2019

❤2

Was this page helpful?

0 / 5 - 0 ratings