Thanks for your work.
I am wondering why are you using tanh in the last activation of the generator?
Thnks again
The goal is to match the range. The range of real images is [-1, 1]. Tanh outputs a value between [-1, 1].
Clear!
If I normalize the whole image [HxW] to [-1,1] and then random crop to size of [H/8xW/8] and feed to the network. Clear that the range of [H/8xW/8] will not in the range [-1,1]. Should not use the tanh in the last layer? Which way do you prefer to handle it? I cannot feed the whole [HxW] due to the memory issue
[-1, 1] is the range of the value each pixel (brightness / color of each pixel should be within -1 and 1), so it has nothing to do with the width and height of the image.
@taesungp : No, I misunderstood my question. Let's I is an image with size of HxW. So the normalization will be
I=I/max(I)
I=(I-0.5)/0.5
Now, the image intensity will be in [-1,1]. If I randomly crop the image into [H/8 and W/8]. Do you think the crop image range still in [-1,1]. No. It will be in a different range.
In the first line you should do
I = I/255.0 instead of I = I/max(I) so that it become independent of the values of the current cropped I.
Yes. But after normalization, we will crop the image. I know that we should normalize after the crop image but in my case, I want to normalize before crop image.
I think I = I/255.0 is independent of cropping. Cropping and then I/255.0 is same as doing I/255.0 and then cropping.
It is correct. But the problem here is that if an image size of WxH is normalized to [-1,1]. Then crop a region in the image, the region may not in range of [-1,1], it may be [-0.5 0.5]. Then the output of tanh is [-1,1], so it makes the inconsistent range between cropped input and output of the network.
.clamp(-1, 1) instead of Tanh(). I gave an example for that
import numpy as np
I = [[1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255],
[1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255],
[1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255],
[1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255],
[1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255]]
I = np.asarray(I)
I = I/255
I= (I-0.5)/0.5
print (I.min(), I.max()) #-0.9921568627450981 1.0
I_crop= I[4:6, 4:6]
print(I_crop.min(),I_crop.max()) #-0.9607843137254902 -0.8745098039215686
Yes...? The cropped image can be just thought as a smaller uncropped image. You are just training with smaller images.
Let's say you don't use cropping. What if the input image is
I = [[1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,19],
[1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,19],
[1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,19],
[1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,19],
[1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18, 19]] + 100
so that all values are within [101, 119]? As such, cropping does not introduce any extra problem. If images are within range [-0.5, 0.5], the generator will learn to output [-arctanh(-0.5), arctanh(0.5)].
The goal is to match the range. The range of real images is [-1, 1]. Tanh outputs a value between [-1, 1].
Yes, I have heard that the range of real images is [-1, 1] elsewhere as well
However, I have two sequential questions.
What I do know is that torchvision.transforms.ToTensor divides the values ranging from [0, 255] by 255, thus scaling them to [0, 1]
But it's been my belief that this actually didn't matter too much because of the normalization layers. The normalization makes the activations have a mean of 0 and a std of 1, so it doesn't matter what range the original input is in, even though its been shifted to [0, 1]. Is that a bad statement or a bad conceived notion? What are your thoughts on that?
torchvision.transforms.Normalize, as neural networks work better with zero-mean data. The input to the networks is [-1, 1] after this conversion. See this line for more details.
- The PIL image is [0, 255]. We convert it into [-1, 1] using
torchvision.transforms.Normalize, as neural networks work better with zero-mean data. The input to the networks is [-1, 1] after this conversion. See this line for more details.- The range for both the original images and generated images is [-1, 1].
Ah, so that is what you meant! I was worried this whole time, images originally contained values ranging from [-1,1] instead of what I been telling people(i.e [0,255])
Also yes, that would do that. I got so used to using different precomputed means and stds which doesn't give [-1, 1] for new data, that I forgot you were using .5 for all.
Does this mean that though, instead of input ranging from [0,1] and output ranging from [0,1] through Sigmoid, its better to normalize the [0,1] input to a [-1,1] input and output a [-1,1] output through Tanh, since the latter is normalized?
Yes, your understanding is correct.
Most helpful comment
Yes, your understanding is correct.