Horovod: Multiple loss.backward() before optimizer.step() in PyTorch

Created on 25 Mar 2019  路  3Comments  路  Source: horovod/horovod

My loss fun has 2 sub-loss tasks, and I want to calculate grad through each loss.backward() in 1 forward. The key code is as below:

optimizer.zero_grad()
A_loss.backward(retain_graph=True)

optimizer.zero_grad()
B_loss.backward(retain_graph=True)

optimizer.zero_grad()
total_loss = (A_loss + B_loss)
total_loss.backward(retain_graph=False)
optimizer.step()

this code works well on stan-alone pytorch, however it does not work on horovod:

  File "/FCN_hvd/trainer.py", line 252, in compute_loss
    B_loss.backward(retain_graph=True)
  File "/usr/local/lib/python2.7/dist-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
  File "/usr/local/lib/python2.7/dist-packages/horovod/torch/__init__.py", line 92, in hook
    assert p not in self._handles

and this error is same as https://github.com/horovod/horovod/issues/796, but it doesn't work with both ways in this issue.

question

Most helpful comment

@alsrgv Yes, it works on 0.6.1 version.
Thanks!

All 3 comments

@nan0755, can you try adding backward_passes_per_step=2 to the hvd.DistributedOptimizer() parameters?

@alsrgv Yes, it works on 0.6.1 version.
Thanks!

@zeyu-hello What is the advantage of calling backward on the different losses before adding them together and call the backward on the sum of the losses?

Is it to increase the memory or computational efficiency?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

YoungDav picture YoungDav  路  3Comments

shaarawy18 picture shaarawy18  路  3Comments

kit1980 picture kit1980  路  3Comments

chentingpc picture chentingpc  路  3Comments

goswamig picture goswamig  路  3Comments