Cudf: [BUG] cuda 11.0 to_dlpack cudaErrorIllegalAddress

Created on 22 Sep 2020 · 11Comments · Source: rapidsai/cudf

Describe the bug
Illegal memory access when converting to dlpack from cudf dataframe. Only happens with cuda 11.0.

Steps/Code to reproduce bug

import cudf
import torch
from torch.utils.dlpack import from_dlpack
df = cudf.datasets.randomdata(10000, dtypes={"label": bool})
df.label = df.label.astype(int)
tensor = from_dlpack(df["label"].to_dlpack()).type(torch.float32)
print(tensor)

Expected behavior
Get a valid dlpack representation as it works in cuda 10.2

Environment overview (please complete the following information)

Environment location: Docker
Method of cuDF install: Docker

bug cuDF (Python) libcudf

Source

jperez999

Most helpful comment

The nightly no longer exhibits this behavior.

jperez999 on 25 Sep 2020

🎉2

All 11 comments

@rgsl888prabhu can you look into this?

kkraus14 on 23 Sep 2020

👍1

@jperez999 I was able to convert and get dltensor using to_dlpack without throwing any cuda error

Wed Sep 23 12:44:44 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 8000     On   | 00000000:15:00.0 Off |                  Off |
| 33%   34C    P0    68W / 260W |      1MiB / 48601MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Quadro RTX 8000     On   | 00000000:2D:00.0  On |                  Off |
| 33%   35C    P8    20W / 260W |    106MiB / 48592MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    1   N/A  N/A      1045      G   /usr/lib/xorg/Xorg                104MiB |
+-----------------------------------------------------------------------------+
(cudf_dev_11) rgsl888@onepiece:~/Projects/backup/backup/cudf/python/cudf$ python
Python 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:25:08) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cudf

>>> import torch
>>> from torch.utils.dlpack import from_dlpack
>>> df = cudf.datasets.randomdata(10000, dtypes={"label": bool})
>>> df.label = df.label.astype(int)
>>> dl_t = df["label"].to_dlpack()
/home/rgsl888/Projects/backup/backup/cudf/python/cudf/cudf/io/dlpack.py:74: UserWarning: WARNING: cuDF to_dlpack() produces column-major (Fortran order) output. If the output tensor needs to be row major, transpose the output of this function.
  return libdlpack.to_dlpack(gdf_cols)
>>> dl_t
<capsule object "dltensor" at 0x7f1081270f60>

This is with latest 0.16.

Can you please provide more information about your container and cudf version being used?

Also please provide details of env with print_env.sh residing in cudf.

rgsl888prabhu on 23 Sep 2020

container is based from: rapidsai/rapidsai-dev:0.15-cuda11.0-devel-ubuntu18.04-py3.7

CONDA_SHLVL=1
LC_ALL=C.UTF-8
NVM_DIR=/usr/local/nvm
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/lib:/opt/conda/envs/rapids/lib
CONDA_EXE=/opt/conda/bin/conda
GDAL_DATA=/opt/conda/envs/rapids/share/gdal
JUPYTER_SERVER_ROOT=/
RAPIDS_DIR=/rapids
LESSCLOSE=/usr/bin/lesspipe %s %s
NVIDIA_PYTORCH_VERSION=
NCCL_ROOT=/opt/conda/envs/rapids
HOSTNAME=0dbdc98bc69b
OLDPWD=/
CUDA_PATH=/opt/conda/envs/rapids
GSETTINGS_SCHEMA_DIR_CONDA_BACKUP=
JUPYTER_SERVER_URL=http://0.0.0.0:8888/
CPL_ZIP_ENCODING=UTF-8
CUDAHOSTCXX=/usr/local/bin/g++
CONDA_PREFIX=/opt/conda/envs/rapids
CONDA_ENV=rapids
PYTHONIOENCODING=utf-8
NVIDIA_VISIBLE_DEVICES=all
_CE_M=
CC=/usr/local/bin/gcc
PROJ_LIB=/opt/conda/envs/rapids/share/proj
PYTORCH_BUILD_VERSION=1.1.0a0+nvidia
NCCL_VERSION=2.7.8
PWD=/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf
NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so
LINES=51
HOME=/root
CONDA_PYTHON_EXE=/opt/conda/bin/python
CUDA_HOME=/usr/local/cuda
CCACHE_DIR=/ccache
PYTORCH_VERSION=
DEBIAN_FRONTEND=noninteractive
CUB_DISABLED=1
_CE_CONDA=
GSETTINGS_SCHEMA_DIR=/opt/conda/envs/rapids/share/glib-2.0/schemas
PARALLEL_LEVEL=16
LIBRARY_PATH=/usr/local/cuda/lib64/stubs
PROJ_NETWORK=ON
CONDA_PROMPT_MODIFIER=(rapids) 
COLUMNS=154
NVIDIA_BUILD_ID=<unknown>
CXX=/usr/local/bin/g++
TERM=xterm
CCACHE_COMPILERCHECK=%compiler% --version
TORCH_CUDA_ARCH_LIST=5.2 6.0 6.1 7.0 7.5 8.0+PTX
CUDA_VERSION=11.0.221
PYTORCH_BUILD_NUMBER=0
PYXTERM_DIMENSIONS=80x25
CCACHE_NOHASHDIR=
NVIDIA_DRIVER_CAPABILITIES=compute,utility
SHLVL=2
NVCC=/usr/local/bin/nvcc
NVIDIA_REQUIRE_CUDA=cuda>=11.0 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=440,driver<441
PATH=/opt/conda/envs/rapids/bin:/opt/conda/condabin:/opt/cmake-3.14.6-Linux-x86_64/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/
CONDA_DEFAULT_ENV=rapids
LESSOPEN=| /usr/bin/lesspipe %s
_=/usr/bin/printenv

cudf version 0+untagged.1.gfa8e9fb

Will attempt 0.16 nightly to see if no longer an issue.

jperez999 on 23 Sep 2020

👍1

Thanks for looking at this @rgsl888prabhu ! What docker image did you use?

Did you also try the code snippet that Julio posted? The snippet you posted wasn't doing the same thing as the reproduction he posted (like we can also convert from cudf to dlpack successfully - the error was after converting that dlpack to a pytorch tensor). The error we get is an illegal memory access when trying to print the pytorch tensor:

Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.

For more information about alternatives visit: ('http://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')
  warnings.warn(errors.NumbaWarning(msg))


/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/io/dlpack.py:74: UserWarning: WARNING: cuDF to_dlpack() produces column-major (Fortran order) output. If the output tensor needs to be row major, transpose the output of this function.
  return libdlpack.to_dlpack(gdf_cols)
Traceback (most recent call last):
  File "test.py", line 7, in <module>
    print(tensor)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/tensor.py", line 153, in __repr__
    return torch._tensor_str._str(self)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/_tensor_str.py", line 371, in _str
    return _str_intern(self)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/_tensor_str.py", line 351, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/_tensor_str.py", line 241, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/torch/_tensor_str.py", line 89, in __init__
    nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: copy_if failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

Fwiw, converting from a cupy array to a pytorch tensor using dlpack doesn't have this error - this code snippet works:

import cupy
import torch
from torch.utils.dlpack import from_dlpack
arr = cupy.arange(1000).astype(int)
tensor = from_dlpack(arr.toDlpack()).type(torch.float32)
print(tensor)

benfred on 23 Sep 2020

I was using local conda built cuda-11 env in ubuntu-18.04. Are you using some specific pytorch from Nvidia, was just curious looking at PYTORCH_BUILD_VERSION=1.1.0a0+nvidia in your env.

For me, from_dlpack is failing with segfault with normal pytorch, so was curious if I have to use something specific.

rgsl888prabhu on 23 Sep 2020

we build pytorch from source using the following:

git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py install

currently no conda/pip install support for cuda 11.0 https://pytorch.org/get-started/locally/#start-locally

jperez999 on 23 Sep 2020

For me, from_dlpack is failing with segfault with normal pytorch, so was curious if I have to use something specific.

Interesting! Does this segfault also occur with the cupy snippet I pasted, or just when using from_dlpack from cudf?

benfred on 23 Sep 2020

It didn't occur when used with cupy.

rgsl888prabhu on 23 Sep 2020

I tried with cuda11 and source built pytorch

Wed Sep 23 16:19:07 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 8000     On   | 00000000:15:00.0 Off |                  Off |
| 33%   34C    P8    15W / 260W |    579MiB / 48601MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Quadro RTX 8000     On   | 00000000:2D:00.0 Off |                  Off |
| 33%   34C    P8    17W / 260W |    109MiB / 48592MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      8479      C   .../envs/cudf_dev/bin/python      575MiB |
|    1   N/A  N/A      1045      G   /usr/lib/xorg/Xorg                104MiB |
+-----------------------------------------------------------------------------+
(cudf_dev_11) rgsl888@onepiece:~/Projects/backup/backup/cudf$ python
Python 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:25:08) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cudf

>>> import torch
>>> from torch.utils.dlpack import from_dlpack
>>> df = cudf.datasets.randomdata(10000, dtypes={"label": bool})
>>> df.label = df.label.astype(int)
>>> tensor = from_dlpack(df["label"].to_dlpack()).type(torch.float32)
/home/rgsl888/anaconda3/envs/cudf_dev_11/lib/python3.7/site-packages/cudf/io/dlpack.py:74: UserWarning: WARNING: cuDF to_dlpack() produces column-major (Fortran order) output. If the output tensor needs to be row major, transpose the output of this function.
  return libdlpack.to_dlpack(gdf_cols)
>>> print(tensor)
tensor([1., 1., 1.,  ..., 1., 1., 1.], device='cuda:0')

and it works.

rgsl888prabhu on 23 Sep 2020

The nightly no longer exhibits this behavior.

jperez999 on 25 Sep 2020

🎉2

Hey, that's good news, thank you for the update. If the issue is resolved, please close the issue.

rgsl888prabhu on 25 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings