Horovod: Error about backward_passes_per_step

Created on 29 Nov 2019  Â·  3Comments  Â·  Source: horovod/horovod

Environment:

  1. Framework: Pytorch
  2. Framework version: 1.1
  3. Horovod version: 0.18.2
  4. MPI version: 3.0.0
  5. CUDA version: 9.0
  6. NCCL version: 2.5.6
  7. Python version: 3.5.6
  8. OS and version: centos7.2
  9. GCC version: 5.4.0

one Network contain two part parameters[p1, p2] with two optimizers, but i face thise problem:
AssertionError: Gradients were computed more than backward_passes_per_step times before call to step(). Increase backward_passes_per_step to accumulate gradients locally

if i set backward_passes_per_step=2, everything is fine.

code such as
optimizer1 = torch.optim.SGD(model.p1)
optimizer2 = torch.optim.SGD(model.p2)
optimizer1 = hvd.DistributedOptimizer(optimizer1, model.name_p1)
optimizer2 = hvd.DistributedOptimizer(optimizer2, model.name_p1)

optimizer1.zero_grad()
loss = criterion(logits, target)
loss.backward()
optimizer1.step()

optimizer2.zero_grad()
loss = criterion(logits, target)
loss.backward() -> !!!bug!!!
optimizer2.step()

question wontfix

All 3 comments

Hey @YoungDav, do you have a reproducible example I can take a look at? It would seem that there is some overlap between the parameters managed by optimizer1 and those of optimizer2. In the example you gave, you initialized both optimizer1 and optimizer2 with model.name_p1, was that a typo in writing this post, or is that also in your training script? If so, I suspect that's what's causing the error.

Hey @tgaddair , I meet the almost same question. I divide the parameters in a model into two parts, then use optimization1 and optimization2 to contral the update of them. But when the second execution toloss.back() There will be an error:

:RuntimeError: Gradients were computed more than backward_passes_per_step times before call to step(). Increase backward_passes_per_step to accumulate gradients locally.

I think that's caused by another part of parameter's gradients have not been zero_grad() (some overlap).
So how can I use horovod in this situation?

now I code like:

    optimizer_1 = hvd.DistributedOptimizer(optimizer_1, named_parameters=model.named_parameters_1)
    optimizer_2 = hvd.DistributedOptimizer(optimizer_2, named_parameters=model.named_parameters_2)

    for i in range(...):
        optimizer_1.zero_grad()
        logits = model(input)
        loss = criterion(logits, target)
        loss.backward()    --------------------->second execution to` loss.back()` There will be an error
        optimizer.step()

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings