Pytorch-lightning: Precision 16 not working

Created on 15 May 2020  ·  11Comments  ·  Source: PyTorchLightning/pytorch-lightning

🐛 Bug

Setting the flag of precision to 16 does not work. Furthermore, I don't need to set the value of gpus if I have my model as cuda version and precision to 32 it automatically runs in GPU, if I say precision=16 first says that amp and cpu don't work, so I set gpus=[0] and then it gives the following error:

I0515 10:03:17.552285 12848 distrib_data_parallel.py:248] GPU available: True, used: True
I0515 10:03:17.552285 12848 distrib_data_parallel.py:296] CUDA_VISIBLE_DEVICES: [0]
I0515 10:03:17.552285 12848 auto_mix_precision.py:52] Using 16bit precision.
Traceback (most recent call last):
  File "C:/Users/rodri/PycharmProjects/TTMelGAN/src/model_dir/train.py", line 146, in <module>
    app.run(train)
  File "C:\Users\rodri\Miniconda3\envs\TTMel\lib\site-packages\absl\app.py", line 299, in run
    _run_main(main, args)
  File "C:\Users\rodri\Miniconda3\envs\TTMel\lib\site-packages\absl\app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "C:/Users/rodri/PycharmProjects/TTMelGAN/src/model_dir/train.py", line 56, in train
    trainer.fit(model)
  File "C:\Users\rodri\Miniconda3\envs\TTMel\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 765, in fit
    self.single_gpu_train(model)
  File "C:\Users\rodri\Miniconda3\envs\TTMel\lib\site-packages\pytorch_lightning\trainer\distrib_parts.py", line 489, in single_gpu_train
    model, optimizers = model.configure_apex(amp, model, self.optimizers, self.amp_level)
NameError: name 'amp' is not defined

To Reproduce

In the colab lightning demo, in the first example MNIST, set pl.Trainer(gpus=1, precision=16) and you get the same error.

Expected behavior

Training correctly

Since it can be reproduced in Colab I guess that the specs of my laptop or AWS EC2 where I get the same does not matter

bug / fix help wanted

Most helpful comment

@williamFalcon shall we raise a warning that user tries to use AMP but have not installed any supports, meaning pt <= 1.5 and missing APEX?

All 11 comments

Hi! thanks for your contribution!, great first issue!

Could you please specify what lightning version are you using?

last one, 0.7.5

cool, could you pls verify with 0.7.6.rc4?

I got the same thing

it seems like you do not have installed APEX, pls run install_AMP.sh

Indeed I had to install Apex, as always it did not work in that way getting pip errors and managed to install it with "conda install -c conda-forge nvidia-apex" in case some has the same issue.

I though that PL was using the apex now integrated in torch so I didn't have to install it separatedly, but after rereading the release notes it will be available with torch 1.6 right?

Apex is used for up to pt 1.5

1.6+ uses native apex

@williamFalcon shall we raise a warning that user tries to use AMP but have not installed any supports, meaning pt <= 1.5 and missing APEX?

so how to fix the problem? Installation from .sh Apex didnt solved the problem

tbh much easier to upgrade to pt 1.6

Was this page helpful?
0 / 5 - 0 ratings

Related issues

baeseongsu picture baeseongsu  ·  3Comments

Vichoko picture Vichoko  ·  3Comments

as754770178 picture as754770178  ·  3Comments

remisphere picture remisphere  ·  3Comments

justusschock picture justusschock  ·  3Comments