Pytorch-lightning: Precision 16 not working

Created on 15 May 2020 · 11Comments · Source: PyTorchLightning/pytorch-lightning

🐛 Bug

Setting the flag of precision to 16 does not work. Furthermore, I don't need to set the value of gpus if I have my model as cuda version and precision to 32 it automatically runs in GPU, if I say precision=16 first says that amp and cpu don't work, so I set gpus=[0] and then it gives the following error:

I0515 10:03:17.552285 12848 distrib_data_parallel.py:248] GPU available: True, used: True
I0515 10:03:17.552285 12848 distrib_data_parallel.py:296] CUDA_VISIBLE_DEVICES: [0]
I0515 10:03:17.552285 12848 auto_mix_precision.py:52] Using 16bit precision.
Traceback (most recent call last):
  File "C:/Users/rodri/PycharmProjects/TTMelGAN/src/model_dir/train.py", line 146, in <module>
    app.run(train)
  File "C:\Users\rodri\Miniconda3\envs\TTMel\lib\site-packages\absl\app.py", line 299, in run
    _run_main(main, args)
  File "C:\Users\rodri\Miniconda3\envs\TTMel\lib\site-packages\absl\app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "C:/Users/rodri/PycharmProjects/TTMelGAN/src/model_dir/train.py", line 56, in train
    trainer.fit(model)
  File "C:\Users\rodri\Miniconda3\envs\TTMel\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 765, in fit
    self.single_gpu_train(model)
  File "C:\Users\rodri\Miniconda3\envs\TTMel\lib\site-packages\pytorch_lightning\trainer\distrib_parts.py", line 489, in single_gpu_train
    model, optimizers = model.configure_apex(amp, model, self.optimizers, self.amp_level)
NameError: name 'amp' is not defined

To Reproduce

In the colab lightning demo, in the first example MNIST, set pl.Trainer(gpus=1, precision=16) and you get the same error.

Expected behavior

Training correctly

Since it can be reproduced in Colab I guess that the specs of my laptop or AWS EC2 where I get the same does not matter

bug / fix help wanted

Source

Brechard

👍1

Most helpful comment

@williamFalcon shall we raise a warning that user tries to use AMP but have not installed any supports, meaning pt <= 1.5 and missing APEX?

Borda on 15 May 2020

👍3

All 11 comments

Hi! thanks for your contribution!, great first issue!

github-actions[bot] on 15 May 2020

Could you please specify what lightning version are you using?

Borda on 15 May 2020

last one, 0.7.5

Brechard on 15 May 2020

cool, could you pls verify with 0.7.6.rc4?

Borda on 15 May 2020

I got the same thing

Brechard on 15 May 2020

it seems like you do not have installed APEX, pls run install_AMP.sh

Borda on 15 May 2020

Indeed I had to install Apex, as always it did not work in that way getting pip errors and managed to install it with "conda install -c conda-forge nvidia-apex" in case some has the same issue.

I though that PL was using the apex now integrated in torch so I didn't have to install it separatedly, but after rereading the release notes it will be available with torch 1.6 right?

Brechard on 15 May 2020

👍1

Apex is used for up to pt 1.5

1.6+ uses native apex