Horovod: Multiple loss.backward() before optimizer.step() in PyTorch

Created on 25 Mar 2019 · 3Comments · Source: horovod/horovod

My loss fun has 2 sub-loss tasks, and I want to calculate grad through each loss.backward() in 1 forward. The key code is as below:

optimizer.zero_grad()
A_loss.backward(retain_graph=True)

optimizer.zero_grad()
B_loss.backward(retain_graph=True)

optimizer.zero_grad()
total_loss = (A_loss + B_loss)
total_loss.backward(retain_graph=False)
optimizer.step()

this code works well on stan-alone pytorch, however it does not work on horovod:

  File "/FCN_hvd/trainer.py", line 252, in compute_loss
    B_loss.backward(retain_graph=True)
  File "/usr/local/lib/python2.7/dist-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
  File "/usr/local/lib/python2.7/dist-packages/horovod/torch/__init__.py", line 92, in hook
    assert p not in self._handles

and this error is same as https://github.com/horovod/horovod/issues/796, but it doesn't work with both ways in this issue.

question

Source

zeyu-hello

Most helpful comment

@alsrgv Yes, it works on 0.6.1 version.
Thanks!

zeyu-hello on 27 Mar 2019

👍2

All 3 comments

@nan0755, can you try adding backward_passes_per_step=2 to the hvd.DistributedOptimizer() parameters?

alsrgv on 25 Mar 2019

@alsrgv Yes, it works on 0.6.1 version.
Thanks!

zeyu-hello on 27 Mar 2019

👍2

@zeyu-hello What is the advantage of calling backward on the different losses before adding them together and call the backward on the sum of the losses?

Is it to increase the memory or computational efficiency?