Environment:
https://github.com/horovod/horovod/blob/master/horovod/torch/__init__.py#L174
If you see this documentation, when using gradient clipping, there is duplicated(twice) synchronization. Once is when 'synchronize' is called and second one is when 'step' is called.
After I followed this documentation with gradient clipping, training speed is slower.
Any solusions?
@ildoonet, thanks for raising this! This regression has been introduced by #597.
While we're thinking about a proper solution, can you specify optimizer._requires_update = set() after DistributedOptimizer wraps original optimizer, like this?
@alsrgv Thanks for the tip. I will try with it and wait for the proper solutions.
@ildoonet, the fix was merged into master. You can reinstall Horovod from master (or wait a bit for 0.16.3), and use .step(synchronize=False), as new documentation prescribes.