Pytorch-cyclegan-and-pix2pix: By changing batchsize should I change learning rate?

Created on 15 Oct 2019  路  8Comments  路  Source: junyanz/pytorch-CycleGAN-and-pix2pix

Hi :)

I have three questions:

1) I rescaled my images to [-1,1] because of the use of tanh in generator, is this not needed?

2) My images are 140x140. should I change the number of resnet's? And what is the intuition behind this?

3) I changed my batch_size to 8 to speed up and kept the InstanceNormalization. When making this change, should I change the learning rate also? And are there other things I should consider changing? Should I use batchNormalization instead?

Thanks in regards!

Most helpful comment

  1. Yes.
  2. You can reduce the number of resnet blocks. For example, you can use --netG resnet_6blocks.
  3. Instancenorm should be fine. You can slightly increase the learning rate. Some folks suggest sqrt(K) or K when the batch size is K. Using the same learning rate is also fine.

All 8 comments

  1. Yes.
  2. You can reduce the number of resnet blocks. For example, you can use --netG resnet_6blocks.
  3. Instancenorm should be fine. You can slightly increase the learning rate. Some folks suggest sqrt(K) or K when the batch size is K. Using the same learning rate is also fine.

Thanks for your reply! Its really great that you are helping us out :)

I normalized my data to - 1,1 and set resnet to 6. I also tried to change real label to 0.9. But my discriminators loss is still going close to 0. Like 0.00011. Is this normal or did my system collapse?

Do you have any other implementation tips for the discriminator loss not going towards zero? Or some link to a place? Should i train generator more then discriminator?

Your training collapsed. You can remove 1-2 layers from D, or increase the learning rate of G.

Thanks ill try that out! Just one last question :)

Is it correct that you don't have any activation function in the last layer of the discriminator? And if understood correctly, why not? I thought it should have sigmoid?

It is implemented in the GANLoss class. See this line.

@kpagels : You have similar issue that I have before. Some comments may works

  1. You can reduce number of layers D from 5 to 4 or 3. It may reduce the training loss D goes to zero
  2. Using larger learning rate when use batch size bigger. I have not test but it may be work
  3. Do not use the batch norm. Batch norm will make the style of image is not changed.

@junyanz

Instancenorm should be fine. You can slightly increase the learning rate. Some folks suggest sqrt(K) or K when the batch size is K. Using the same learning rate is also fine.

If my learning rate for batch size of 1 is 0.0002 then the batch size of 16 should use the learning rate of 0.0002 * sprt(16) = 0.0008? Am I right?

I am not sure what is best. Some folks suggest sqrt(K) or K when the batch size is K. Using the same learning rate is also fine.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bill0812 picture bill0812  路  3Comments

khryang picture khryang  路  3Comments

JamesChenChina picture JamesChenChina  路  3Comments

HectorAnadon picture HectorAnadon  路  4Comments

lyhangustc picture lyhangustc  路  5Comments