Pytorch-lightning: pytorch_lightning.utilities.debugging.MisconfigurationException

Created on 31 Mar 2020  路  13Comments  路  Source: PyTorchLightning/pytorch-lightning

Hi, I encountered the problem like #899 锛孊ut I checked my pytorch is not CPU version. Can anyone help? Thanks!

Traceback (most recent call last):
  File "/home/allen_wu/miniconda3/envs/pytorch/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/allen_wu/miniconda3/envs/pytorch/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/allen_wu/.vscode-server-insiders/extensions/ms-python.python-2020.3.69010/pythonFiles/lib/python/debugpy/wheels/debugpy/__main__.py", line 45, in <module>
    cli.main()
  File "/home/allen_wu/.vscode-server-insiders/extensions/ms-python.python-2020.3.69010/pythonFiles/lib/python/debugpy/wheels/debugpy/../debugpy/server/cli.py", line 427, in main
    run()
  File "/home/allen_wu/.vscode-server-insiders/extensions/ms-python.python-2020.3.69010/pythonFiles/lib/python/debugpy/wheels/debugpy/../debugpy/server/cli.py", line 264, in run_file
    runpy.run_path(options.target, run_name="__main__")
  File "/home/allen_wu/miniconda3/envs/pytorch/lib/python3.7/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/home/allen_wu/miniconda3/envs/pytorch/lib/python3.7/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/home/allen_wu/miniconda3/envs/pytorch/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/allen_wu/sota_lm_dev/codebase/gpt2/Gpt2SeqClassifier.py", line 200, in <module>
    trainer = Trainer(gpus=1)
  File "/home/allen_wu/miniconda3/envs/pytorch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 366, in __init__
    self.data_parallel_device_ids = parse_gpu_ids(self.gpus)
  File "/home/allen_wu/miniconda3/envs/pytorch/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 622, in parse_gpu_ids
    gpus = sanitize_gpu_ids(gpus)
  File "/home/allen_wu/miniconda3/envs/pytorch/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 592, in sanitize_gpu_ids
    raise MisconfigurationException(message)
pytorch_lightning.utilities.debugging.MisconfigurationException: 
            You requested GPUs: [0]
            But your machine only has: []

My environment packages:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
absl-py                   0.9.0                    pypi_0    pypi
attrs                     19.3.0                     py_0    conda-forge
backcall                  0.1.0                      py_0    conda-forge
blas                      1.0                         mkl  
bleach                    3.1.3              pyh8c360ce_0    conda-forge
boto3                     1.12.24                  pypi_0    pypi
botocore                  1.15.24                  pypi_0    pypi
ca-certificates           2019.11.28           hecc5488_0    conda-forge
cachetools                4.0.0                    pypi_0    pypi
certifi                   2019.11.28       py37hc8dfbb8_1    conda-forge
chardet                   3.0.4                    pypi_0    pypi
click                     7.1.1                    pypi_0    pypi
cudatoolkit               10.1.243             h6bb024c_0  
decorator                 4.4.2                      py_0    conda-forge
defusedxml                0.6.0                      py_0    conda-forge
docutils                  0.15.2                   pypi_0    pypi
entrypoints               0.3             py37hc8dfbb8_1001    conda-forge
filelock                  3.0.12                   pypi_0    pypi
freetype                  2.9.1                h8a8886c_1  
future                    0.18.2                   pypi_0    pypi
google-auth               1.11.3                   pypi_0    pypi
google-auth-oauthlib      0.4.1                    pypi_0    pypi
grpcio                    1.27.2                   pypi_0    pypi
icu                       64.2                 he1b5a44_1    conda-forge
idna                      2.9                      pypi_0    pypi
importlib-metadata        1.5.0            py37hc8dfbb8_1    conda-forge
importlib_metadata        1.5.0                         1    conda-forge
intel-openmp              2020.0                      166  
ipykernel                 5.1.4            py37h5ca1d4c_0    conda-forge
ipython                   7.13.0           py37h43977f1_1    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                7.5.1                    pypi_0    pypi
jedi                      0.16.0           py37hc8dfbb8_1    conda-forge
jinja2                    2.11.1                     py_0    conda-forge
jmespath                  0.9.5                    pypi_0    pypi
joblib                    0.14.1                   pypi_0    pypi
jpeg                      9b                   h024ee3a_2  
json5                     0.9.0                      py_0    conda-forge
jsonschema                3.2.0            py37hc8dfbb8_1    conda-forge
jupyter_client            6.0.0                      py_0    conda-forge
jupyter_core              4.6.3            py37hc8dfbb8_1    conda-forge
jupyterlab                2.0.1                      py_0    conda-forge
jupyterlab_server         1.0.7                      py_0    conda-forge
ld_impl_linux-64          2.33.1               h53a641e_7  
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libpng                    1.6.37               hbc83047_0  
libsodium                 1.0.17               h516909a_0    conda-forge
libstdcxx-ng              9.1.0                hdf63c60_0  
libtiff                   4.1.0                h2733197_0  
libuv                     1.34.0               h516909a_0    conda-forge
markdown                  3.2.1                    pypi_0    pypi
markupsafe                1.1.1            py37h8f50634_1    conda-forge
mistune                   0.8.4           py37h516909a_1000    conda-forge
mkl                       2020.0                      166  
mkl-service               2.3.0            py37he904b0f_0  
mkl_fft                   1.0.15           py37ha843d7b_0  
mkl_random                1.1.0            py37hd6b4f25_0  
nbconvert                 5.6.1                    py37_0    conda-forge
nbformat                  5.0.4                      py_0    conda-forge
ncurses                   6.2                  he6710b0_0  
ninja                     1.9.0            py37hfd86e86_0  
nodejs                    13.10.1              hf5d1a2b_0    conda-forge
notebook                  6.0.3                    py37_0    conda-forge
numpy                     1.18.1           py37h4f9e942_0  
numpy-base                1.18.1           py37hde5b4d6_1  
oauthlib                  3.1.0                    pypi_0    pypi
olefile                   0.46                     py37_0  
openssl                   1.1.1e               h516909a_0    conda-forge
pandas                    1.0.2            py37h0573a6f_0  
pandoc                    2.9.2                         0    conda-forge
pandocfilters             1.4.2                      py_1    conda-forge
parso                     0.6.2                      py_0    conda-forge
pexpect                   4.8.0            py37hc8dfbb8_1    conda-forge
pickleshare               0.7.5           py37hc8dfbb8_1001    conda-forge
pillow                    7.0.0            py37hb39fc2d_0  
pip                       20.0.2                   py37_1  
prometheus_client         0.7.1                      py_0    conda-forge
prompt-toolkit            3.0.4                      py_0    conda-forge
protobuf                  3.11.3                   pypi_0    pypi
ptyprocess                0.6.0                   py_1001    conda-forge
pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
pygments                  2.6.1                      py_0    conda-forge
pyrsistent                0.15.7           py37h8f50634_1    conda-forge
python                    3.7.6                h0371630_2  
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                3.7                     1_cp37m    conda-forge
pytorch                   1.4.0           py3.7_cuda10.1.243_cudnn7.6.3_0    pytorch
pytorch-lightning         0.7.1                    pypi_0    pypi
pytz                      2019.3                     py_0  
pyzmq                     19.0.0           py37hac76be4_1    conda-forge
readline                  7.0                  h7b6447c_5  
regex                     2020.2.20                pypi_0    pypi
requests                  2.23.0                   pypi_0    pypi
requests-oauthlib         1.3.0                    pypi_0    pypi
rsa                       4.0                      pypi_0    pypi
s3transfer                0.3.3                    pypi_0    pypi
sacremoses                0.0.38                   pypi_0    pypi
scikit-learn              0.22.2.post1             pypi_0    pypi
scipy                     1.4.1                    pypi_0    pypi
send2trash                1.5.0                      py_0    conda-forge
sentencepiece             0.1.85                   pypi_0    pypi
setuptools                46.0.0                   py37_0  
six                       1.14.0                   py37_0  
sklearn                   0.0                      pypi_0    pypi
sqlite                    3.31.1               h7b6447c_0  
tensorboard               2.1.1                    pypi_0    pypi
terminado                 0.8.3            py37hc8dfbb8_1    conda-forge
testpath                  0.4.4                      py_0    conda-forge
tk                        8.6.8                hbc83047_0  
tokenizers                0.5.2                    pypi_0    pypi
torchtext                 0.5.0                    pypi_0    pypi
torchvision               0.5.0                py37_cu101    pytorch
tornado                   6.0.4            py37h8f50634_1    conda-forge
tqdm                      4.43.0                   pypi_0    pypi
traitlets                 4.3.3            py37hc8dfbb8_1    conda-forge
transformers              2.5.1                    pypi_0    pypi
urllib3                   1.25.8                   pypi_0    pypi
wcwidth                   0.1.8                      py_0    conda-forge
webencodings              0.5.1                      py_1    conda-forge
werkzeug                  1.0.0                    pypi_0    pypi
wheel                     0.34.2                   py37_0  
widgetsnbextension        3.5.1                    pypi_0    pypi
xz                        5.2.4                h14c3975_4  
zeromq                    4.3.2                he1b5a44_2    conda-forge
zipp                      3.1.0                      py_0    conda-forge
zlib                      1.2.11               h7b6447c_3  
zstd                      1.3.7                h0b5b093_0
bug / fix help wanted

All 13 comments

If you run this in your python console,

import torch
print(torch.cuda.device_count())

what number does it return?

@awaelchli Thanks for your fast reply! It returns 1.

And I think it's because of the version of pytorch. I tried to change my pytorch version from py3.7_cuda10.1.243_cudnn7.6.3_0 to py3.6_cuda10.0.130_cudnn7.6.3_0. And the problem was solved.

Strange. So the code that I posted prints 0 with the first version, py3.7_cuda10.1.243_cudnn7.6.3_0? Could it be possible that your GPU supports cuda 10 but not 10.1? Just a wild guess.

No, the code that you posted still prints 1 with the first version. (py3.7_cuda10.1.243_cudnn7.6.3_0)

It's work fine when using only pytorch to train. But when I tried to rewrite it into the LightningModule template then I got this error. That also makes me confused.

It gets funnier :) Are you absolutely sure? Because Lightning also calls torch.cuda.device_count() internally to determine what you see in the error message. See here and here. So it should return 1 there.

That's my mistake. There's nothing to do with the version of Pytorch.

I set CUDA_VISIBLE_DEVICES to 1 in my program, but I only have one GPU on my machine. :(

So When I run the code that you posted in my python console it returns 1. But when I executed my program, there's no GPU that can be detected.

Thank you for replying!

That's a relief. Glad it was just this and nothing more complicated.
cheers!

@awaelchli Checked both this thread and this issue as well: https://github.com/PyTorchLightning/pytorch-lightning/issues/899

For me (Python: 3.7.5)

import torch
print(torch.cuda.device_count())

returns 0 on a DGX-1 (pascal generation) with cuda-10.1 and cuda-10.2 versions of pytorch 1.5.0. I did not find a cuda-10.0 version so I am currently working with pytorch 1.4.0 and cuda-10.0. "print(torch.cuda.device_count())" shows 8 with cuda-10.0 and pytorch 1.4.0

I am not setting CUDA_VISIBLE_DEVICES in my code.

Given what you describe here I can only conclude this is a PyTorch issue between versions 1.4 and 1.5. Have you checked their issue page?

@awaelchli Sorry, I should have tested this and mentioned this before. Works for cuda-9.2 version of pytorch 1.5.0, which does make me think its a cuda issue.

Training with a GPU works for me in pytorch, and pytorch lightning. But when I use ray.tune to search the hyperparameter space it dies with this exact error:


2020-09-01 13:39:29,157 ERROR trial_runner.py:523 -- Trial DEFAULT_2d46f_00009: Error processing event.
Traceback (most recent call last):
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 471, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 430, in fetch_result
result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/worker.py", line 1538, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TuneError): ray::ImplicitFunc.train() (pid=57762, ip=130.20.133.226)
File "python/ray/_raylet.pyx", line 479, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/trainable.py", line 332, in train
result = self.step()
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/function_runner.py", line 337, in step
self._report_thread_runner_error(block=True)
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/function_runner.py", line 456, in _report_thread_runner_error
.format(err_tb_str)))
ray.tune.error.TuneError: Trial raised an exception. Traceback:
ray::ImplicitFunc.train() (pid=57762, ip=130.20.133.226)
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/function_runner.py", line 224, in run
self._entrypoint()
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/function_runner.py", line 287, in entrypoint
self._status_reporter.get_checkpoint())
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/function_runner.py", line 507, in _trainable_func
output = train_func(config)
File "", line 28, in train_tune
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 524, in __init__
self.data_parallel_device_ids = _parse_gpu_ids(self.gpus)
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 451, in _parse_gpu_ids
gpus = sanitize_gpu_ids(gpus)
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 410, in sanitize_gpu_ids
""")
pytorch_lightning.utilities.exceptions.MisconfigurationException:
You requested GPUs: [0]

But your machine only has: []

And:
torch.cuda.is_available()
True

torch.__version__
'1.3.1'

torch.cuda.device_count()
1

pytorch_lightning.__version__
'0.6.1.dev'

CUDA_VISIBLE_DEVICES=6 on an 8-gpu machine.

I would say it as ray.tune, but it fails inside pytorch_lightning.

Any thoughts?

Training with a GPU works for me in pytorch, and pytorch lightning. But when I use ray.tune to search the hyperparameter space it dies with this exact error:

2020-09-01 13:39:29,157 ERROR trial_runner.py:523 -- Trial DEFAULT_2d46f_00009: Error processing event.

Traceback (most recent call last):
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 471, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 430, in fetch_result
result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/worker.py", line 1538, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TuneError): ray::ImplicitFunc.train() (pid=57762, ip=130.20.133.226)
File "python/ray/_raylet.pyx", line 479, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/trainable.py", line 332, in train
result = self.step()
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/function_runner.py", line 337, in step
self._report_thread_runner_error(block=True)
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/function_runner.py", line 456, in _report_thread_runner_error
.format(err_tb_str)))
ray.tune.error.TuneError: Trial raised an exception. Traceback:
ray::ImplicitFunc.train() (pid=57762, ip=130.20.133.226)
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/function_runner.py", line 224, in run
self._entrypoint()
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/function_runner.py", line 287, in entrypoint
self._status_reporter.get_checkpoint())
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/ray/tune/function_runner.py", line 507, in _trainable_func
output = train_func(config)
File "", line 28, in train_tune
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 524, in init
self.data_parallel_device_ids = _parse_gpu_ids(self.gpus)
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 451, in _parse_gpu_ids
gpus = sanitize_gpu_ids(gpus)
File "/home/d3p692/anaconda3/envs/transformers/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 410, in sanitize_gpu_ids
""")
pytorch_lightning.utilities.exceptions.MisconfigurationException:
You requested GPUs: [0]
But your machine only has: []
And:
torch.cuda.is_available()
True

torch.version
'1.3.1'

torch.cuda.device_count()
1

pytorch_lightning.version
'0.6.1.dev'

CUDA_VISIBLE_DEVICES=6 on an 8-gpu machine.

I would say it as ray.tune, but it fails inside pytorch_lightning.

Any thoughts?

Hi, I just encountered the exact same issue. I wander if you have found a solution to fix it. Thanks :-)

Hi, I just encountered the exact same issue. I wander if you have found a solution to fix it. Thanks :-)

I think you should set "CUDA_VISIBLE_DEVICES" in the environment variables;
Here is the introduction from Ray[tune] official website.

Was this page helpful?
0 / 5 - 0 ratings