My loss fun has 2 sub-loss tasks, and I want to calculate grad through each loss.backward() in 1 forward. The key code is as below:
optimizer.zero_grad()
A_loss.backward(retain_graph=True)
optimizer.zero_grad()
B_loss.backward(retain_graph=True)
optimizer.zero_grad()
total_loss = (A_loss + B_loss)
total_loss.backward(retain_graph=False)
optimizer.step()
this code works well on stan-alone pytorch, however it does not work on horovod:
File "/FCN_hvd/trainer.py", line 252, in compute_loss
B_loss.backward(retain_graph=True)
File "/usr/local/lib/python2.7/dist-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/__init__.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
File "/usr/local/lib/python2.7/dist-packages/horovod/torch/__init__.py", line 92, in hook
assert p not in self._handles
and this error is same as https://github.com/horovod/horovod/issues/796, but it doesn't work with both ways in this issue.
@nan0755, can you try adding backward_passes_per_step=2 to the hvd.DistributedOptimizer() parameters?
@alsrgv Yes, it works on 0.6.1 version.
Thanks!
@zeyu-hello What is the advantage of calling backward on the different losses before adding them together and call the backward on the sum of the losses?
Is it to increase the memory or computational efficiency?
Most helpful comment
@alsrgv Yes, it works on 0.6.1 version.
Thanks!