Cudf: [BUG] No NVIDIA GPU detected (despite GPU being running)

Created on 6 Jun 2020  Â·  9Comments  Â·  Source: rapidsai/cudf

Describe the bug
When running import cudf within a Python interpreter, a user warning is emitted explaining that no NVIDIA GPU was detected, despite nvidia-smi showing no ambiguity about an Nvidia GPU being active and running.

Steps/Code to reproduce bug

Starting from a fresh conda installation

$ nvidia-smi
Sat Jun  6 12:41:41 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   31C    P8    29W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

$ conda create -n cu-python python=3.7.6
$ conda activate cu-python
$ conda install -c conda-forge -c rapidsai-nightly cudf
$ python

Python 3.7.6 (default, Jan  8 2020, 19:59:22)
>>> import cudf
/home/ubuntu/anaconda3/envs/cu-python/lib/python3.7/site-packages/cudf/utils/gpu_utils.py:120: UserWarning: No NVIDIA GPU detected
  warnings.warn("No NVIDIA GPU detected")

Expected behavior
Running import cudf should not trigger a No NVIDIA GPU detected and the GPU device should indeed be detected.

Environment overview (please complete the following information)

  • Environment location: AWS
  • Method of cuDF install: Conda

Environment details

  • no cudf/print_env.sh file seems to be present on the system
bug cuDF (Python)

Most helpful comment

I also recently came across this issue.

>>> from dask_cuda import LocalCUDACluster
/root/miniconda3/envs/rapids/lib/python3.8/site-packages/cudf/utils/gpu_utils.py:120: UserWarning: No NVIDIA GPU detected
  warnings.warn("No NVIDIA GPU detected")

But In my case I installed via

conda install -c rapidsai-nightly -c nvidia -c conda-forge \
    -c defaults rapids=0.15 python=3.8 cudatoolkit=10.2

but realized nvidia-smi showed me I have CUDA version 10.1. Changed to

conda install -c rapidsai-nightly -c nvidia -c conda-forge \
    -c defaults rapids=0.15 python=3.8 cudatoolkit=10.1

and worked fine.

All 9 comments

Could you run this command and share the output:

from cudf._cuda.gpu import getDeviceCount
print(getDeviceCount())

The nvidia-smi output indicates a K80 Kepler architecture GPU. cuDF currently requires a Pascal or newer architecture GPU (info).

The nvidia-smi output indicates a K80 Kepler architecture GPU. cuDF currently requires a Pascal or newer architecture GPU (info).

We should present a different warning in this case indicating this.

We should present a different warning in this case indicating this.

Yea, that warning is present. However, I suspect getDeviceCount might be the one failing in this case.

@JivanRoquet We have looked into this issue and it seems to detect a GPU and throw a warning since it is a Kepler architecture GPU.

(rapids14) root@9f5d7c18eedf:/content# python
Python 3.7.6 | packaged by conda-forge | (default, Jun  1 2020, 18:57:50) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cudf
/usr/local/envs/rapids14/lib/python3.7/site-packages/cudf/utils/gpu_utils.py:58: UserWarning: You will need a GPU with NVIDIA Pascalâ„¢ or newer architecture
Detected GPU 0: Tesla K80
Detected Compute Capability: 3.7
  + str(minor_version)
>>> exit()
nvidia-smi
Mon Jun  8 15:01:06 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P8    28W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
  1. Could you provide us your env dump(output of conda list)
  2. Could you aswell share output of the below code
from cudf._cuda.gpu import getDeviceCount
print(getDeviceCount())

I also recently came across this issue.

>>> from dask_cuda import LocalCUDACluster
/root/miniconda3/envs/rapids/lib/python3.8/site-packages/cudf/utils/gpu_utils.py:120: UserWarning: No NVIDIA GPU detected
  warnings.warn("No NVIDIA GPU detected")

But In my case I installed via

conda install -c rapidsai-nightly -c nvidia -c conda-forge \
    -c defaults rapids=0.15 python=3.8 cudatoolkit=10.2

but realized nvidia-smi showed me I have CUDA version 10.1. Changed to

conda install -c rapidsai-nightly -c nvidia -c conda-forge \
    -c defaults rapids=0.15 python=3.8 cudatoolkit=10.1

and worked fine.

@galipremsagar It looks like this check is failing due to being built against a newer driver than is supported by the current running driver, which ends up with us falling back to the case of thinking no GPU is detected. I'm guessing we just need to check for a specific error that the driver API / runtime API was unable to be initialized correctly as opposed to no GPU being detected.

Could you run this command and share the output:

from cudf._cuda.gpu import getDeviceCount
print(getDeviceCount())

hi, i did have this problem
ModuleNotFoundError: No module named 'cudf._cuda'

Could you run this command and share the output:

from cudf._cuda.gpu import getDeviceCount
print(getDeviceCount())

hi, i did have this problem
ModuleNotFoundError: No module named 'cudf._cuda'

This has since been refactored out of cudf and into rmm where it would now be:

from rmm._cuda.gpu import getDeviceCount
print(getDeviceCount())
Was this page helpful?
0 / 5 - 0 ratings