Setting the flag of precision to 16 does not work. Furthermore, I don't need to set the value of gpus if I have my model as cuda version and precision to 32 it automatically runs in GPU, if I say precision=16 first says that amp and cpu don't work, so I set gpus=[0] and then it gives the following error:
I0515 10:03:17.552285 12848 distrib_data_parallel.py:248] GPU available: True, used: True
I0515 10:03:17.552285 12848 distrib_data_parallel.py:296] CUDA_VISIBLE_DEVICES: [0]
I0515 10:03:17.552285 12848 auto_mix_precision.py:52] Using 16bit precision.
Traceback (most recent call last):
File "C:/Users/rodri/PycharmProjects/TTMelGAN/src/model_dir/train.py", line 146, in <module>
app.run(train)
File "C:\Users\rodri\Miniconda3\envs\TTMel\lib\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "C:\Users\rodri\Miniconda3\envs\TTMel\lib\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "C:/Users/rodri/PycharmProjects/TTMelGAN/src/model_dir/train.py", line 56, in train
trainer.fit(model)
File "C:\Users\rodri\Miniconda3\envs\TTMel\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 765, in fit
self.single_gpu_train(model)
File "C:\Users\rodri\Miniconda3\envs\TTMel\lib\site-packages\pytorch_lightning\trainer\distrib_parts.py", line 489, in single_gpu_train
model, optimizers = model.configure_apex(amp, model, self.optimizers, self.amp_level)
NameError: name 'amp' is not defined
In the colab lightning demo, in the first example MNIST, set pl.Trainer(gpus=1, precision=16) and you get the same error.
Training correctly
Since it can be reproduced in Colab I guess that the specs of my laptop or AWS EC2 where I get the same does not matter
Hi! thanks for your contribution!, great first issue!
Could you please specify what lightning version are you using?
last one, 0.7.5
cool, could you pls verify with 0.7.6.rc4?
I got the same thing
it seems like you do not have installed APEX, pls run install_AMP.sh
Indeed I had to install Apex, as always it did not work in that way getting pip errors and managed to install it with "conda install -c conda-forge nvidia-apex" in case some has the same issue.
I though that PL was using the apex now integrated in torch so I didn't have to install it separatedly, but after rereading the release notes it will be available with torch 1.6 right?
Apex is used for up to pt 1.5
1.6+ uses native apex
@williamFalcon shall we raise a warning that user tries to use AMP but have not installed any supports, meaning pt <= 1.5 and missing APEX?
so how to fix the problem? Installation from .sh Apex didnt solved the problem
tbh much easier to upgrade to pt 1.6
Most helpful comment
@williamFalcon shall we raise a warning that user tries to use AMP but have not installed any supports, meaning pt <= 1.5 and missing APEX?