Describe the bug
When running import cudf within a Python interpreter, a user warning is emitted explaining that no NVIDIA GPU was detected, despite nvidia-smi showing no ambiguity about an Nvidia GPU being active and running.
Steps/Code to reproduce bug
Starting from a fresh conda installation
$ nvidia-smi
Sat Jun 6 12:41:41 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 31C P8 29W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
$ conda create -n cu-python python=3.7.6
$ conda activate cu-python
$ conda install -c conda-forge -c rapidsai-nightly cudf
$Â python
Python 3.7.6 (default, Jan 8 2020, 19:59:22)
>>> import cudf
/home/ubuntu/anaconda3/envs/cu-python/lib/python3.7/site-packages/cudf/utils/gpu_utils.py:120: UserWarning: No NVIDIA GPU detected
warnings.warn("No NVIDIA GPU detected")
Expected behavior
Running import cudf should not trigger a No NVIDIA GPU detected and the GPU device should indeed be detected.
Environment overview (please complete the following information)
Environment details
cudf/print_env.sh file seems to be present on the systemCould you run this command and share the output:
from cudf._cuda.gpu import getDeviceCount
print(getDeviceCount())
The nvidia-smi output indicates a K80 Kepler architecture GPU. cuDF currently requires a Pascal or newer architecture GPU (info).
The
nvidia-smioutput indicates aK80Kepler architecture GPU. cuDF currently requires a Pascal or newer architecture GPU (info).
We should present a different warning in this case indicating this.
We should present a different warning in this case indicating this.
Yea, that warning is present. However, I suspect getDeviceCount might be the one failing in this case.
@JivanRoquet We have looked into this issue and it seems to detect a GPU and throw a warning since it is a Kepler architecture GPU.
(rapids14) root@9f5d7c18eedf:/content# python
Python 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cudf
/usr/local/envs/rapids14/lib/python3.7/site-packages/cudf/utils/gpu_utils.py:58: UserWarning: You will need a GPU with NVIDIA Pascalâ„¢ or newer architecture
Detected GPU 0: Tesla K80
Detected Compute Capability: 3.7
+ str(minor_version)
>>> exit()
nvidia-smi
Mon Jun 8 15:01:06 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 34C P8 28W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
conda list)from cudf._cuda.gpu import getDeviceCount
print(getDeviceCount())
I also recently came across this issue.
>>> from dask_cuda import LocalCUDACluster
/root/miniconda3/envs/rapids/lib/python3.8/site-packages/cudf/utils/gpu_utils.py:120: UserWarning: No NVIDIA GPU detected
warnings.warn("No NVIDIA GPU detected")
But In my case I installed via
conda install -c rapidsai-nightly -c nvidia -c conda-forge \
-c defaults rapids=0.15 python=3.8 cudatoolkit=10.2
but realized nvidia-smi showed me I have CUDA version 10.1. Changed to
conda install -c rapidsai-nightly -c nvidia -c conda-forge \
-c defaults rapids=0.15 python=3.8 cudatoolkit=10.1
and worked fine.
@galipremsagar It looks like this check is failing due to being built against a newer driver than is supported by the current running driver, which ends up with us falling back to the case of thinking no GPU is detected. I'm guessing we just need to check for a specific error that the driver API / runtime API was unable to be initialized correctly as opposed to no GPU being detected.
Could you run this command and share the output:
from cudf._cuda.gpu import getDeviceCount print(getDeviceCount())
hi, i did have this problem
ModuleNotFoundError: No module named 'cudf._cuda'
Could you run this command and share the output:
from cudf._cuda.gpu import getDeviceCount print(getDeviceCount())hi, i did have this problem
ModuleNotFoundError: No module named 'cudf._cuda'
This has since been refactored out of cudf and into rmm where it would now be:
from rmm._cuda.gpu import getDeviceCount
print(getDeviceCount())
Most helpful comment
I also recently came across this issue.
But In my case I installed via
but realized
nvidia-smishowed me I have CUDA version 10.1. Changed toand worked fine.