def backward_D_basic(self, netD, real, fake):
# Real
pred_real = netD(real)
loss_D_real = self.criterionGAN(pred_real, True)
# Fake
pred_fake = netD(fake.detach())
loss_D_fake = self.criterionGAN(pred_fake, False)
# Combined loss
loss_D = (loss_D_real + loss_D_fake) * 0.5
# backward
loss_D.backward()
return loss_D
Q1. What is detach() in pytorch?
I know this is very basic question, but I cannot understand the explanation in document.
(https://pytorch.org/docs/stable/tensors.html)
It seems detach() prevent something related with graident & backprop...
Q2. Why do detach() for fake only?
I thought D care about real and fake equally.
Actually, because I don't know what is role of detach(), so I also cannot know it.
I'm very new in this field, so kind explanation will be greatly appreciated.
Thank you.
detach() stops the gradient. fake.detach() will make sure that G does not get gradients. Real does not need to be detached as real is a constant, not a variable.
let me tell you. The role of detach is to freeze the gradient drop. Whether it is for discriminating the network or generating the network, we update all about logD(G(z)). For the discriminant network, freezing G does not affect the overall gradient update (that is The inner function is considered to be a constant, which does not affect the outer function to find the gradient), but conversely, if D is frozen, there is no way to complete the gradient update. Therefore, we did not use the gradient of freezing D when training the generator. So, for the generator, we did calculate the gradient of D, but we didn't update the weight of D (only optimizer_g.step was written), so the discriminator will not be changed when the generator is trained. You may ask, that's why, when you train the discriminator, you need to add detach. Isn't this an extra move?
Because we freeze the gradient, we can speed up the training, so we can use it where it can be used. It is not an extra task. Then when we train the generator, because of logD(G(z)), there is no way to freeze the gradient of D, so we will not write detach here.
Most helpful comment
let me tell you. The role of detach is to freeze the gradient drop. Whether it is for discriminating the network or generating the network, we update all about logD(G(z)). For the discriminant network, freezing G does not affect the overall gradient update (that is The inner function is considered to be a constant, which does not affect the outer function to find the gradient), but conversely, if D is frozen, there is no way to complete the gradient update. Therefore, we did not use the gradient of freezing D when training the generator. So, for the generator, we did calculate the gradient of D, but we didn't update the weight of D (only optimizer_g.step was written), so the discriminator will not be changed when the generator is trained. You may ask, that's why, when you train the discriminator, you need to add detach. Isn't this an extra move?
Because we freeze the gradient, we can speed up the training, so we can use it where it can be used. It is not an extra task. Then when we train the generator, because of logD(G(z)), there is no way to freeze the gradient of D, so we will not write detach here.