I followed the instruction to install from a docker container. The process completes well, but when I run a definitely code, it raises the error AssertionError: cuda is not available. Please check your installation..
What command did I run?
python tools/train_net.py --config-file configs/FCOS-Detection/Base-FCOS.yaml --num-gpus 2
What I observed?
The logs are as follows:
Command Line Args: Namespace(config_file='configs/FCOS-Detection/Base-FCOS.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=[], resume=False)
Traceback (most recent call last):
File "tools/train_net.py", line 235, in <module>
args=(args,),
File "/opt/tiger/conda/lib/python3.7/site-packages/detectron2/engine/launch.py", line 54, in launch
daemon=False,
File "/opt/tiger/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/opt/tiger/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/opt/tiger/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/opt/tiger/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/opt/tiger/conda/lib/python3.7/site-packages/detectron2/engine/launch.py", line 63, in _distributed_worker
assert torch.cuda.is_available(), "cuda is not available. Please check your installation."
AssertionError: cuda is not available. Please check your installation.
Running without error.
The CUDA is definitely there. When I executed nvcc --version, I got nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89.
No CUDA runtime is found, using CUDA_HOME='/opt/tiger/cuda'
sys.platform linux
Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]
numpy 1.18.1
detectron2 0.1.2 @/opt/tiger/conda/lib/python3.7/site-packages/detectron2
detectron2 compiler GCC 8.3
detectron2 CUDA compiler not available
DETECTRON2_ENV_MODULE
PyTorch 1.5.0 @/opt/tiger/conda/lib/python3.7/site-packages/torch
PyTorch debug build False
CUDA available False
Pillow 7.0.0
torchvision 0.6.0a0+82fd1c8 @/opt/tiger/conda/lib/python3.7/site-packages/torchvision
fvcore 0.1
PyTorch built with:
The docker file is meant to use like this.
The problem that I can't use your Dockerfile directly is that, I have to use some private Docker image. So I have to install it through docker container and export it as image.
Thanks for clarifying. I thought you were using the dockerfile since you mention docker container.
You need to install pytorch and other dependencies correctly so that torch.cuda.is_available() returns True. Since this is a pytorch function, it has nothing to do with detectron2.
Fixed internally. Kinda embarrassing...
@byronyi Can you say what you did to fix it,
I have the same issue.
Most helpful comment
@byronyi Can you say what you did to fix it,
I have the same issue.