Pytorch-cyclegan-and-pix2pix: By changing batchsize should I change learning rate?

Created on 15 Oct 2019 · 8Comments · Source: junyanz/pytorch-CycleGAN-and-pix2pix

Hi :)

I have three questions:

1) I rescaled my images to [-1,1] because of the use of tanh in generator, is this not needed?

2) My images are 140x140. should I change the number of resnet's? And what is the intuition behind this?

3) I changed my batch_size to 8 to speed up and kept the InstanceNormalization. When making this change, should I change the learning rate also? And are there other things I should consider changing? Should I use batchNormalization instead?

Thanks in regards!

Source

kpagels

Most helpful comment

Yes.
You can reduce the number of resnet blocks. For example, you can use --netG resnet_6blocks.
Instancenorm should be fine. You can slightly increase the learning rate. Some folks suggest sqrt(K) or K when the batch size is K. Using the same learning rate is also fine.

junyanz on 15 Oct 2019

👍2

All 8 comments

Yes.
You can reduce the number of resnet blocks. For example, you can use --netG resnet_6blocks.
Instancenorm should be fine. You can slightly increase the learning rate. Some folks suggest sqrt(K) or K when the batch size is K. Using the same learning rate is also fine.

junyanz on 15 Oct 2019

👍2

Thanks for your reply! Its really great that you are helping us out :)

I normalized my data to - 1,1 and set resnet to 6. I also tried to change real label to 0.9. But my discriminators loss is still going close to 0. Like 0.00011. Is this normal or did my system collapse?

Do you have any other implementation tips for the discriminator loss not going towards zero? Or some link to a place? Should i train generator more then discriminator?

kpagels on 16 Oct 2019

Your training collapsed. You can remove 1-2 layers from D, or increase the learning rate of G.

junyanz on 16 Oct 2019

👍1

Thanks ill try that out! Just one last question :)

Is it correct that you don't have any activation function in the last layer of the discriminator? And if understood correctly, why not? I thought it should have sigmoid?

kpagels on 17 Oct 2019

It is implemented in the GANLoss class. See this line.

junyanz on 18 Oct 2019

👍1

@kpagels : You have similar issue that I have before. Some comments may works

You can reduce number of layers D from 5 to 4 or 3. It may reduce the training loss D goes to zero
Using larger learning rate when use batch size bigger. I have not test but it may be work
Do not use the batch norm. Batch norm will make the style of image is not changed.

John1231983 on 18 Oct 2019

@junyanz

Instancenorm should be fine. You can slightly increase the learning rate. Some folks suggest sqrt(K) or K when the batch size is K. Using the same learning rate is also fine.

If my learning rate for batch size of 1 is 0.0002 then the batch size of 16 should use the learning rate of 0.0002 * sprt(16) = 0.0008? Am I right?

John1231983 on 18 Oct 2019

I am not sure what is best. Some folks suggest sqrt(K) or K when the batch size is K. Using the same learning rate is also fine.

junyanz on 20 Oct 2019

Was this page helpful?

0 / 5 - 0 ratings