Pytorch-lightning: Gradient accumulation fails with fp16 precision

Created on 29 Oct 2020 · 3Comments · Source: PyTorchLightning/pytorch-lightning

🐛 Bug

Setting accumulate_grad_batches > 1 and precision = 16 causes the following error:

RuntimeError: unscale_() has already been called on this optimizer since the last update().

Please reproduce using the BoringModel and post here

https://colab.research.google.com/drive/1_7pxqPlpc79k0VYlRdtRXE0JQbhSBWHy?usp=sharing

Environment

CUDA:
- GPU:
  - Tesla T4
- available: True
- version: 10.1
Packages:
- numpy: 1.18.5
- pyTorch_debug: False
- pyTorch_version: 1.6.0+cu101
- pytorch-lightning: 1.0.4
- tqdm: 4.41.1
System:
- OS: Linux
- architecture:
  - 64bit
- processor: x86_64
- python: 3.6.9
- version: #1 SMP Thu Jul 23 08:00:38 PDT 2020

bug / fix help wanted

Source

maxjeblick

👍2

Most helpful comment

I'm only getting this error on 1.0.4 (after I upgraded a few hours ago). I downgraded to 1.0.3 and there is not there.

sudarshan85 on 30 Oct 2020

👍3

All 3 comments

Thanks for taking this @ydcjeff!

edenlightning on 29 Oct 2020

👍1

might also be fixed by https://github.com/PyTorchLightning/pytorch-lightning/pull/4311

edenlightning on 29 Oct 2020

I'm only getting this error on 1.0.4 (after I upgraded a few hours ago). I downgraded to 1.0.3 and there is not there.

sudarshan85 on 30 Oct 2020

👍3

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Wandb Flatten Dict

anthonytec2 · 3Comments

How set number of epochs

Vichoko · 3Comments

Add "epoch" options to basic templates

baeseongsu · 3Comments

How to use pytorch-lightning to run GAN？

as754770178 · 3Comments

Dataloader starving the gpu

maxime-louis · 3Comments