Hello everyone,
I am trying to train a WGAN-GP model on O1 and O2 opt levels, but on the gradient penalty phase, I get gradient overflow error till the division with zero exception with both O1 and O2. I searched for some WGAN-GP apex codes, which I find in here;
https://github.com/hukkelas/progan-pytorch/blob/master/src/models/loss.py
using another scaled loss and scaling factor for GP. However, this also gives the same error.
My code for GP calculation as follows,
def _gradient_penalty(self, real_data, generated_data, gp_weight):
batch_size = real_data.size()[0]
# Calculate interpolation
alpha = torch.rand(batch_size, 1, 1)
alpha = alpha.expand_as(real_data)
alpha.to(real_data.dtype)
interpolated = alpha * real_data.data + (1 - alpha) * generated_data.data
interpolated = Variable(interpolated, requires_grad=True)
interpolated.to(real_data.dtype)
# Calculate probability of interpolated examples
prob_interpolated = self.discriminator(interpolated)
# Calculate gradients of probabilities with respect to examples
gradients = torch_grad(outputs=prob_interpolated, inputs=interpolated,
grad_outputs=torch.ones(prob_interpolated.size()),
create_graph=True, retain_graph=True)[0]
gradients = gradients.view(gradients.size(0), -1)
gradient_norm = gradients.norm(2, dim=1)
gradient_penalty = ((gradient_norm - 1) ** 2).mean()
return gp_weight * gradient_penalty`
gradient_penalty = self._gradient_penalty(x, generated_data, gp_weight)
d_gen = self.discriminator(generated_data)
d_real = self.discriminator(x)
d_loss = d_gen.mean() - d_real.mean() + gradient_penalty
with amp.scale_loss(d_loss, self.disc_opt, loss_id=1) as scaled_loss:
scaled_loss.backward()
Can you help me about that please?
cc @mcarilli
I have encountered the same problem.
Have you managed to solve your problem?
I have encountered the same problem.
Have you managed to solve your problem?
Unfortunately, I did not manage to solve the problem.
I met the quite same problem unfortunately T_T so any suggestions?
Hello,
I had an issue with the same symptoms in one of my projects recently. Although I didn't find the root cause, I found a hacky way around it.
The overflow occurred for me when I used 4 optimizers for 4 networks initialized as follows:
def init_amp(self):
# mixed precision training
models = [self.netG, self.netDec, self.netEnc, self.netD]
optims = [self.optimizer_G, self.optimizer_Dec, self.optimizer_Enc, self.optimizer_D]
models, optims = amp.initialize(models, optims, opt_level="O1", num_losses=4)
self.netG, self.netDec, self.netEnc, self.netD = models
self.optimizer_G, self.optimizer_Dec, self.optimizer_Enc, self.optimizer_D = optims
And the backward passes were done as follows (Generator side was done identically):
if self.opt.amp == 1:
with amp.scale_loss(self.loss_D, self.optimizer_D, loss_id=0) as scaled_loss:
scaled_loss.backward(retain_graph=True)
with amp.scale_loss(self.loss_D, self.optimizer_Enc, loss_id=1) as scaled_loss:
scaled_loss.backward()
else:
self.loss_D.backward()
self.optimizer_D.step()
self.optimizer_Enc.step()
In my code some of the networks used the same losses. The problem was removed when I moved the networks that were using the same losses under the same optimizers, so that I have 4 networks and 2 optimizers initialized as follows.
def init_amp(self):
# mixed precision training
models = [nn.Sequential(self.netG, self.netDec), nn.Sequential(self.netEnc, self.netD)]
optims = [self.optimizer_G, self.optimizer_D]
models, optims = amp.initialize(models, optims, opt_level="O1", num_losses=2)
self.netG, self.netDec = list(models[0].children())[0], list(models[0].children())[1]
self.netEnc, self.netD = list(models[1].children())[0], list(models[1].children())[1]
self.optimizer_G, self.optimizer_D = optims
After doing this and also changing the backward passes correspondingly, the overflow no longer occurred. Changing back to 4 optimizers immediately reproduces the issue for me.
I hope it helps someone with similar issue. Sorry for ugly code, I couldn't figure out a cleaner way and I didn't find any documentation for this kind of arrangement for multiple networks under the same optimizer.