Cudf: [BUG] No NVIDIA GPU detected (despite GPU being running)

Created on 6 Jun 2020 · 9Comments · Source: rapidsai/cudf

Describe the bug
When running import cudf within a Python interpreter, a user warning is emitted explaining that no NVIDIA GPU was detected, despite nvidia-smi showing no ambiguity about an Nvidia GPU being active and running.

Steps/Code to reproduce bug

Starting from a fresh conda installation

$ nvidia-smi
Sat Jun  6 12:41:41 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   31C    P8    29W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

$ conda create -n cu-python python=3.7.6
$ conda activate cu-python
$ conda install -c conda-forge -c rapidsai-nightly cudf
$ python

Python 3.7.6 (default, Jan  8 2020, 19:59:22)
>>> import cudf
/home/ubuntu/anaconda3/envs/cu-python/lib/python3.7/site-packages/cudf/utils/gpu_utils.py:120: UserWarning: No NVIDIA GPU detected
  warnings.warn("No NVIDIA GPU detected")

Expected behavior
Running import cudf should not trigger a No NVIDIA GPU detected and the GPU device should indeed be detected.

Environment overview (please complete the following information)

Environment location: AWS
Method of cuDF install: Conda

Environment details

no cudf/print_env.sh file seems to be present on the system

bug cuDF (Python)

Source

JivanRoquet

Most helpful comment

I also recently came across this issue.

>>> from dask_cuda import LocalCUDACluster
/root/miniconda3/envs/rapids/lib/python3.8/site-packages/cudf/utils/gpu_utils.py:120: UserWarning: No NVIDIA GPU detected
  warnings.warn("No NVIDIA GPU detected")

But In my case I installed via

conda install -c rapidsai-nightly -c nvidia -c conda-forge \
    -c defaults rapids=0.15 python=3.8 cudatoolkit=10.2

but realized nvidia-smi showed me I have CUDA version 10.1. Changed to

conda install -c rapidsai-nightly -c nvidia -c conda-forge \
    -c defaults rapids=0.15 python=3.8 cudatoolkit=10.1

and worked fine.

raybellwaves on 4 Aug 2020

👍3

All 9 comments

Could you run this command and share the output:

from cudf._cuda.gpu import getDeviceCount
print(getDeviceCount())

galipremsagar on 6 Jun 2020

The nvidia-smi output indicates a K80 Kepler architecture GPU. cuDF currently requires a Pascal or newer architecture GPU (info).

ayushdg on 7 Jun 2020

The nvidia-smi output indicates a K80 Kepler architecture GPU. cuDF currently requires a Pascal or newer architecture GPU (info).

We should present a different warning in this case indicating this.

kkraus14 on 7 Jun 2020

👍1

We should present a different warning in this case indicating this.

Yea, that warning is present. However, I suspect getDeviceCount might be the one failing in this case.

galipremsagar on 7 Jun 2020

@JivanRoquet We have looked into this issue and it seems to detect a GPU and throw a warning since it is a Kepler architecture GPU.

(rapids14) root@9f5d7c18eedf:/content# python
Python 3.7.6 | packaged by conda-forge | (default, Jun  1 2020, 18:57:50) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cudf
/usr/local/envs/rapids14/lib/python3.7/site-packages/cudf/utils/gpu_utils.py:58: UserWarning: You will need a GPU with NVIDIA Pascal™ or newer architecture
Detected GPU 0: Tesla K80
Detected Compute Capability: 3.7
  + str(minor_version)
>>> exit()

nvidia-smi
Mon Jun  8 15:01:06 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P8    28W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Could you provide us your env dump(output of conda list)
Could you aswell share output of the below code

from cudf._cuda.gpu import getDeviceCount
print(getDeviceCount())

galipremsagar on 8 Jun 2020

I also recently came across this issue.

>>> from dask_cuda import LocalCUDACluster
/root/miniconda3/envs/rapids/lib/python3.8/site-packages/cudf/utils/gpu_utils.py:120: UserWarning: No NVIDIA GPU detected
  warnings.warn("No NVIDIA GPU detected")

But In my case I installed via

conda install -c rapidsai-nightly -c nvidia -c conda-forge \
    -c defaults rapids=0.15 python=3.8 cudatoolkit=10.2

but realized nvidia-smi showed me I have CUDA version 10.1. Changed to

conda install -c rapidsai-nightly -c nvidia -c conda-forge \
    -c defaults rapids=0.15 python=3.8 cudatoolkit=10.1

and worked fine.

raybellwaves on 4 Aug 2020

👍3

@galipremsagar It looks like this check is failing due to being built against a newer driver than is supported by the current running driver, which ends up with us falling back to the case of thinking no GPU is detected. I'm guessing we just need to check for a specific error that the driver API / runtime API was unable to be initialized correctly as opposed to no GPU being detected.

kkraus14 on 4 Aug 2020

👍1

Could you run this command and share the output:
from cudf._cuda.gpu import getDeviceCount
print(getDeviceCount())

hi, i did have this problem
ModuleNotFoundError: No module named 'cudf._cuda'

omidPakshekar on 3 Nov 2020

Could you run this command and share the output:
from cudf._cuda.gpu import getDeviceCount
print(getDeviceCount())
hi, i did have this problem
ModuleNotFoundError: No module named 'cudf._cuda'

This has since been refactored out of cudf and into rmm where it would now be:

from rmm._cuda.gpu import getDeviceCount
print(getDeviceCount())

kkraus14 on 3 Nov 2020

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[BUG] "NaT" string literal needs to be recognized as `null` in to_timestamps method

galipremsagar · 3Comments

[BUG] mean() fails on groupby

AjayThorve · 3Comments

[BUG] Series built from ephemeral CuPy arrays change due to CuPy's memory reuse

beckernick · 3Comments

Latest Docker container gives CUDA driver version error

MurrayData · 3Comments

[BUG] to_orc fails if one of the columns is a string column

ayushdg · 3Comments