Pytorch-cyclegan-and-pix2pix: In the function backward_D_basic, why operate detach() for fake???

Created on 17 Sep 2018  路  2Comments  路  Source: junyanz/pytorch-CycleGAN-and-pix2pix

def backward_D_basic(self, netD, real, fake):
    # Real
    pred_real = netD(real)
    loss_D_real = self.criterionGAN(pred_real, True)
    # Fake
    pred_fake = netD(fake.detach())
    loss_D_fake = self.criterionGAN(pred_fake, False)
    # Combined loss
    loss_D = (loss_D_real + loss_D_fake) * 0.5
    # backward
    loss_D.backward()
    return loss_D

Q1. What is detach() in pytorch?
I know this is very basic question, but I cannot understand the explanation in document.
(https://pytorch.org/docs/stable/tensors.html)
It seems detach() prevent something related with graident & backprop...

Q2. Why do detach() for fake only?
I thought D care about real and fake equally.
Actually, because I don't know what is role of detach(), so I also cannot know it.

I'm very new in this field, so kind explanation will be greatly appreciated.
Thank you.

Most helpful comment

let me tell you. The role of detach is to freeze the gradient drop. Whether it is for discriminating the network or generating the network, we update all about logD(G(z)). For the discriminant network, freezing G does not affect the overall gradient update (that is The inner function is considered to be a constant, which does not affect the outer function to find the gradient), but conversely, if D is frozen, there is no way to complete the gradient update. Therefore, we did not use the gradient of freezing D when training the generator. So, for the generator, we did calculate the gradient of D, but we didn't update the weight of D (only optimizer_g.step was written), so the discriminator will not be changed when the generator is trained. You may ask, that's why, when you train the discriminator, you need to add detach. Isn't this an extra move?
Because we freeze the gradient, we can speed up the training, so we can use it where it can be used. It is not an extra task. Then when we train the generator, because of logD(G(z)), there is no way to freeze the gradient of D, so we will not write detach here.

All 2 comments

detach() stops the gradient. fake.detach() will make sure that G does not get gradients. Real does not need to be detached as real is a constant, not a variable.

let me tell you. The role of detach is to freeze the gradient drop. Whether it is for discriminating the network or generating the network, we update all about logD(G(z)). For the discriminant network, freezing G does not affect the overall gradient update (that is The inner function is considered to be a constant, which does not affect the outer function to find the gradient), but conversely, if D is frozen, there is no way to complete the gradient update. Therefore, we did not use the gradient of freezing D when training the generator. So, for the generator, we did calculate the gradient of D, but we didn't update the weight of D (only optimizer_g.step was written), so the discriminator will not be changed when the generator is trained. You may ask, that's why, when you train the discriminator, you need to add detach. Isn't this an extra move?
Because we freeze the gradient, we can speed up the training, so we can use it where it can be used. It is not an extra task. Then when we train the generator, because of logD(G(z)), there is no way to freeze the gradient of D, so we will not write detach here.

Was this page helpful?
0 / 5 - 0 ratings