Cudf: [BUG] Can't use cupy function in dask + cudf task

Created on 17 Sep 2019  路  4Comments  路  Source: rapidsai/cudf

Describe the bug
This issue is meant to superceed dask#5322 as the official forum for discussing problems with cupy + cudf + dask compatability.

The problem is best illustrated by this "hang-reproducer" gist (mostly copied below to make the discussion in this issue self contained). In many cases, when using the standard multi-threaded scheduler in dask, the execution of the task graph will "hang" when a cupy function is used (even if the "use" is trivial).

Steps/Code to reproduce bug

The following code snippet can be used to reproduce the "hang":

import dask
from dask.threaded import get
from dask.base import tokenize
from toolz import merge

import cudf
import cupy
import numpy as np

ddf = dask.datasets.timeseries(seed = 42)
gddf = ddf.map_partitions(cudf.from_pandas)

def _percentiles_summary(df):
    x = cupy.array([]) # <-- Problematic Line
    vals_and_weights = (np.array([1004.0, 1004.0]), np.array([50.0, 50.0]))
    return vals_and_weights

def _partition_quantiles(df, npartitions):

    def _dtype_info(df):
        return df.dtype, None

    def _combine(sequence_of_data):
        return sequence_of_data

    token = tokenize(df)
    df_keys = df.__dask_keys__()

    name0 = "re-quantiles-0-" + token
    dtype_dsk = {(name0, 0): (_dtype_info, df_keys[0])}

    name1 = "re-quantiles-1-" + token
    val_dsk = {
        (name1, i): (_percentiles_summary, key)
        for i, key in enumerate(df_keys)
    }

    val_dsk["combine"] = (
        _combine, [(name1, i) for i, key in enumerate(df_keys)]
    )
    return merge(df.dask, dtype_dsk, val_dsk)

dsk = _partition_quantiles(gddf["id"], gddf.npartitions)
%timeit get(dsk, "combine")

Note that the problem is caused by x = cupy.array([]) (there is no problem after removing this line). Also, the code will sometimes run once without any problems, but the hang will always occur when the graph is executed within a loop (hence the use of %timeit).

Expected behavior
In the snippet above, the behavior should be the same with and without the x = cupy.array([]) line.

Environment overview (please complete the following information)

  • Environment location: Bare-metal (dgx machine)
  • Method of cuDF install: from source

Environment details

Click here to see environment details

 **git***
 commit 2d6e14d8b00362093bddf815d26662c5a05f8500 (HEAD -> scatter-api, origin/scatter-api)
 Merge: 3a50322 65268e7
 Author: Richard (Rick) Zamora <[email protected]>
 Date:   Tue Sep 17 12:39:29 2019 -0500

 Merge branch 'branch-0.10' into scatter-api
 **git submodules***
 b165e1fb11eeea64ccf95053e40f2424312599cc thirdparty/cub (v1.7.1)
 63f644be44201467e3938d59ed9d89cc8725c35d thirdparty/jitify (remotes/origin/feature/api_v2)

 ***OS Information***
 DGX_NAME="DGX Server"
 DGX_PRETTY_NAME="NVIDIA DGX Server"
 DGX_SWBUILD_DATE="2018-03-20"
 DGX_SWBUILD_VERSION="3.1.6"
 DGX_COMMIT_ID="1b0f58ecbf989820ce745a9e4836e1de5eea6cfd"
 DGX_SERIAL_NUMBER=QTFCOU822000C

 DGX_OTA_VERSION="3.1.7"
 DGX_OTA_DATE="Mon Jul  2 18:36:07 PDT 2018"
 DISTRIB_ID=Ubuntu
 DISTRIB_RELEASE=16.04
 DISTRIB_CODENAME=xenial
 DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS"
 NAME="Ubuntu"
 VERSION="16.04.5 LTS (Xenial Xerus)"
 ID=ubuntu
 ID_LIKE=debian
 PRETTY_NAME="Ubuntu 16.04.5 LTS"
 VERSION_ID="16.04"
 HOME_URL="http://www.ubuntu.com/"
 SUPPORT_URL="http://help.ubuntu.com/"
 BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
 VERSION_CODENAME=xenial
 UBUNTU_CODENAME=xenial
 Linux dgx15 4.4.0-135-generic #161-Ubuntu SMP Mon Aug 27 10:45:01 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

 ***GPU Information***
 Tue Sep 17 11:09:07 2019
 +-----------------------------------------------------------------------------+
 | NVIDIA-SMI 396.44                 Driver Version: 396.44                    |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |===============================+======================+======================|
 |   0  Tesla V100-SXM2...  On   | 00000000:06:00.0 Off |                    0 |
 | N/A   36C    P0    56W / 300W |    629MiB / 32510MiB |      0%      Default |
 +-------------------------------+----------------------+----------------------+
 |   1  Tesla V100-SXM2...  On   | 00000000:07:00.0 Off |                    0 |
 | N/A   34C    P0    44W / 300W |     11MiB / 32510MiB |      0%      Default |
 +-------------------------------+----------------------+----------------------+
 |   2  Tesla V100-SXM2...  On   | 00000000:0A:00.0 Off |                    0 |
 | N/A   35C    P0    43W / 300W |     11MiB / 32510MiB |      0%      Default |
 +-------------------------------+----------------------+----------------------+
 |   3  Tesla V100-SXM2...  On   | 00000000:0B:00.0 Off |                    0 |
 | N/A   32C    P0    43W / 300W |     11MiB / 32510MiB |      0%      Default |
 +-------------------------------+----------------------+----------------------+
 |   4  Tesla V100-SXM2...  On   | 00000000:85:00.0 Off |                    0 |
 | N/A   36C    P0    57W / 300W |    517MiB / 32510MiB |      0%      Default |
 +-------------------------------+----------------------+----------------------+
 |   5  Tesla V100-SXM2...  On   | 00000000:86:00.0 Off |                    0 |
 | N/A   35C    P0    45W / 300W |     11MiB / 32510MiB |      0%      Default |
 +-------------------------------+----------------------+----------------------+
 |   6  Tesla V100-SXM2...  On   | 00000000:89:00.0 Off |                    0 |
 | N/A   36C    P0    43W / 300W |     11MiB / 32510MiB |      0%      Default |
 +-------------------------------+----------------------+----------------------+
 |   7  Tesla V100-SXM2...  On   | 00000000:8A:00.0 Off |                    0 |
 | N/A   32C    P0    42W / 300W |     11MiB / 32510MiB |      0%      Default |
 +-------------------------------+----------------------+----------------------+

 +-----------------------------------------------------------------------------+
 | Processes:                                                       GPU Memory |
 |  GPU       PID   Type   Process name                             Usage      |
 |=============================================================================|
 |    0     28951      C   ...iniconda3/envs/cudf_bugfixes/bin/python   618MiB |
 |    4     66094      C   ...iniconda3/envs/cudf_bugfixes/bin/python   506MiB |
 +-----------------------------------------------------------------------------+

 ***CPU***
 Architecture:          x86_64
 CPU op-mode(s):        32-bit, 64-bit
 Byte Order:            Little Endian
 CPU(s):                80
 On-line CPU(s) list:   0-79
 Thread(s) per core:    2
 Core(s) per socket:    20
 Socket(s):             2
 NUMA node(s):          2
 Vendor ID:             GenuineIntel
 CPU family:            6
 Model:                 79
 Model name:            Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
 Stepping:              1
 CPU MHz:               3480.640
 CPU max MHz:           3600.0000
 CPU min MHz:           1200.0000
 BogoMIPS:              4392.10
 Virtualization:        VT-x
 L1d cache:             32K
 L1i cache:             32K
 L2 cache:              256K
 L3 cache:              51200K
 NUMA node0 CPU(s):     0-19,40-59
 NUMA node1 CPU(s):     20-39,60-79
 Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt ssbd ibrs ibpb stibp kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts flush_l1d

 ***CMake***
 /home/nfs/rzamora/miniconda3/envs/cudf_bugfixes/bin/cmake
 cmake version 3.15.2

 CMake suite maintained and supported by Kitware (kitware.com/cmake).

 ***g++***
 /usr/bin/g++
 g++ (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
 Copyright (C) 2015 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


 ***nvcc***

 ***Python***
 /home/nfs/rzamora/miniconda3/envs/cudf_bugfixes/bin/python
 Python 3.7.3

 ***Environment Variables***
 PATH                            : /home/nfs/rzamora/.vscode-server-insiders/bin/a39d2de39dd5038a1a696800ac9af6dc32a31eab/bin:/home/nfs/rzamora/bin:/home/nfs/rzamora/.local/bin:/home/nfs/rzamora/miniconda3/envs/cudf_bugfixes/bin:/home/nfs/rzamora/miniconda3/condabin:/home/nfs/rzamora/.vscode-server-insiders/bin/a39d2de39dd5038a1a696800ac9af6dc32a31eab/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
 LD_LIBRARY_PATH                 :
 NUMBAPRO_NVVM                   : /usr/local/cuda-9.2/nvvm/lib64/libnvvm.so
 NUMBAPRO_LIBDEVICE              : /usr/local/cuda-9.2/nvvm/libdevice
 CONDA_PREFIX                    : /home/nfs/rzamora/miniconda3/envs/cudf_bugfixes
 PYTHON_PATH                     :

 ***conda packages***
 /home/nfs/rzamora/miniconda3/condabin/conda
 # packages in environment at /home/nfs/rzamora/miniconda3/envs/cudf_bugfixes:
 #
 # Name                    Version                   Build  Channel
 _libgcc_mutex             0.1                        main
 aiohttp                   3.6.0                    pypi_0    pypi
 alabaster                 0.7.12                     py_0    conda-forge
 appdirs                   1.4.3                      py_1    conda-forge
 arrow-cpp                 0.14.1           py37h6b969ab_1    conda-forge
 asn1crypto                0.24.0                py37_1003    conda-forge
 aspy.yaml                 1.3.0                      py_0    conda-forge
 async-timeout             3.0.1                    pypi_0    pypi
 atomicwrites              1.3.0                      py_0    conda-forge
 attrs                     19.1.0                     py_0    conda-forge
 aws-sam-translator        1.14.0                   py37_0    conda-forge
 aws-xray-sdk              0.95                       py_0    conda-forge
 babel                     2.7.0                      py_0    conda-forge
 backcall                  0.1.0                      py_0    conda-forge
 backports                 1.0                        py_2    conda-forge
 backports.tempfile        1.0                        py_0    conda-forge
 backports.weakref         1.0.post1             py37_1000    conda-forge
 black                     19.3b0                     py_0
 bleach                    3.1.0                      py_0    conda-forge
 bokeh                     1.3.4                    py37_0    conda-forge
 boost-cpp                 1.70.0               h8e57a91_2    conda-forge
 boto                      2.49.0                     py_0    conda-forge
 boto3                     1.9.220                    py_0    conda-forge
 botocore                  1.12.220                   py_0    conda-forge
 brotli                    1.0.7             he1b5a44_1000    conda-forge
 bzip2                     1.0.8                h516909a_0    conda-forge
 c-ares                    1.15.0            h516909a_1001    conda-forge
 ca-certificates           2019.5.15                     1
 cached-property           1.5.1                      py_0    conda-forge
 certifi                   2019.6.16                py37_1
 cffi                      1.12.3           py37h8022711_0    conda-forge
 cfgv                      2.0.1                      py_0    conda-forge
 cfn-lint                  0.23.5                   py37_0    conda-forge
 chardet                   3.0.4                 py37_1003    conda-forge
 click                     7.0                        py_0    conda-forge
 cloudpickle               1.2.1                      py_0    conda-forge
 cmake                     3.15.2               hf94ab9c_0    conda-forge
 commonmark                0.9.0                      py_0    conda-forge
 cookies                   2.2.1                      py_0    conda-forge
 cryptography              2.7              py37h72c5cf5_0    conda-forge
 cudatoolkit               9.2                           0
 cudf                      0.10.0a0+1424.g24f354d.dirty           dev_0    <develop>
 cudnn                     7.6.0                 cuda9.2_0
 cupy                      6.0.0            py37hc15394e_0
 curl                      7.65.3               hf8cf82a_0    conda-forge
 cython                    0.29.13          py37he1b5a44_0    conda-forge
 cytoolz                   0.10.0           py37h516909a_0    conda-forge
 dask                      1.2.1+254.g558e11c           dev_0    <develop>
 dask-core                 2.3.0                      py_0
 dask-cudf                 0.10.0a0+1424.g24f354d.dirty           dev_0    <develop>
 decorator                 4.4.0                      py_0    conda-forge
 defusedxml                0.5.0                      py_1    conda-forge
 distributed               2.3.2+14.g7a1a369          pypi_0    pypi
 dlpack                    0.2                  he1b5a44_0    conda-forge
 docker-py                 4.0.2                    py37_0    conda-forge
 docker-pycreds            0.4.0                      py_0    conda-forge
 docutils                  0.15.2                   py37_0    conda-forge
 double-conversion         3.1.5                he1b5a44_1    conda-forge
 ecdsa                     0.13                       py_0    conda-forge
 editdistance              0.5.3            py37hf484d3e_0    conda-forge
 entrypoints               0.3                   py37_1000    conda-forge
 expat                     2.2.5             he1b5a44_1003    conda-forge
 fastavro                  0.22.4           py37h516909a_0    conda-forge
 fastrlock                 0.4              py37he6710b0_0
 flake8                    3.7.7                    py37_0
 flask                     1.1.1                      py_1    conda-forge
 flatbuffers               1.11.0               he1b5a44_0    conda-forge
 freetype                  2.10.0               he983fc9_1    conda-forge
 fsspec                    0.4.4                      py_0    conda-forge
 future                    0.17.1                py37_1000    conda-forge
 gflags                    2.2.2             he1b5a44_1001    conda-forge
 glog                      0.4.0                he1b5a44_1    conda-forge
 gmp                       6.1.2             hf484d3e_1000    conda-forge
 grpc-cpp                  1.23.0               h18db393_0    conda-forge
 heapdict                  1.0.0                 py37_1000    conda-forge
 httpretty                 0.9.6                      py_0    conda-forge
 hypothesis                4.34.0                   py37_0    conda-forge
 icu                       64.2                 he1b5a44_1    conda-forge
 identify                  1.4.7                      py_0    conda-forge
 idna                      2.8                   py37_1000    conda-forge
 imagesize                 1.1.0                      py_0    conda-forge
 importlib_metadata        0.20                     py37_0    conda-forge
 ipykernel                 5.1.2            py37h5ca1d4c_0    conda-forge
 ipython                   7.8.0            py37h5ca1d4c_0    conda-forge
 ipython_genutils          0.2.0                      py_1    conda-forge
 isort                     4.3.21                   py37_0
 itsdangerous              1.1.0                      py_0    conda-forge
 jedi                      0.15.1                   py37_0    conda-forge
 jinja2                    2.10.1                     py_0    conda-forge
 jmespath                  0.9.4                      py_0    conda-forge
 jpeg                      9c                h14c3975_1001    conda-forge
 json5                     0.8.5                      py_0
 jsondiff                  1.1.2                      py_0    conda-forge
 jsonpatch                 1.24                       py_0    conda-forge
 jsonpickle                1.2                        py_0    conda-forge
 jsonpointer               2.0                        py_0    conda-forge
 jsonschema                3.0.2                    py37_0    conda-forge
 jupyter-server-proxy      1.1.0                    pypi_0    pypi
 jupyter_client            5.3.1                      py_0    conda-forge
 jupyter_core              4.4.0                      py_0    conda-forge
 jupyterlab                1.0.2            py37hf63ae98_0
 jupyterlab-nvdashboard    0.1.9                    pypi_0    pypi
 jupyterlab_server         1.0.0                      py_1
 krb5                      1.16.3            h05b26f9_1001    conda-forge
 libblas                   3.8.0               12_openblas    conda-forge
 libcblas                  3.8.0               12_openblas    conda-forge
 libcurl                   7.65.3               hda55be3_0    conda-forge
 libedit                   3.1.20170329      hf8c457e_1001    conda-forge
 libevent                  2.1.10               h72c5cf5_0    conda-forge
 libffi                    3.2.1             he1b5a44_1006    conda-forge
 libgcc-ng                 9.1.0                hdf63c60_0
 libgfortran-ng            7.3.0                hdf63c60_0
 liblapack                 3.8.0               12_openblas    conda-forge
 libnvstrings              0.9.0                 cuda9.2_0    rapidsai
 libopenblas               0.3.7                h6e990d7_1    conda-forge
 libpng                    1.6.37               hed695b0_0    conda-forge
 libprotobuf               3.8.0                h8b12597_0    conda-forge
 librmm                    0.9.0                 cuda9.2_0    rapidsai
 libsodium                 1.0.17               h516909a_0    conda-forge
 libssh2                   1.8.2                h22169c7_2    conda-forge
 libstdcxx-ng              9.1.0                hdf63c60_0
 libtiff                   4.0.10            h57b8799_1003    conda-forge
 libuv                     1.31.0               h516909a_0    conda-forge
 llvmlite                  0.29.0           py37hfd453ef_1    conda-forge
 locket                    0.2.0                      py_2    conda-forge
 lz4-c                     1.8.3             he1b5a44_1001    conda-forge
 markdown                  2.6.11                   pypi_0    pypi
 markupsafe                1.1.1            py37h14c3975_0    conda-forge
 mccabe                    0.6.1                      py_1    conda-forge
 mistune                   0.8.4           py37h14c3975_1000    conda-forge
 mock                      3.0.5                    py37_0    conda-forge
 more-itertools            7.2.0                      py_0    conda-forge
 moto                      1.3.8                      py_1    conda-forge
 msgpack-python            0.6.1            py37h6bb024c_0    conda-forge
 multidict                 4.5.2                    pypi_0    pypi
 nbconvert                 5.6.0                    py37_1    conda-forge
 nbformat                  4.4.0                      py_1    conda-forge
 nbsphinx                  0.4.2                      py_0    conda-forge
 nccl                      1.3.5                 cuda9.2_0
 ncurses                   6.1               hf484d3e_1002    conda-forge
 nodeenv                   1.3.3                      py_0    conda-forge
 nodejs                    10.13.0              he6710b0_0
 notebook                  6.0.1                    py37_0    conda-forge
 numba                     0.45.1           py37hb3f55d8_0    conda-forge
 numpy                     1.17.1           py37h95a1406_0    conda-forge
 numpydoc                  0.9.1                      py_0    conda-forge
 nvstrings                 0.9.0                    py37_0    rapidsai
 olefile                   0.46                       py_0    conda-forge
 openssl                   1.1.1d               h7b6447c_1
 packaging                 19.0                       py_0    conda-forge
 pandas                    0.24.2           py37hb3f55d8_0    conda-forge
 pandoc                    1.19.2                        0    conda-forge
 pandocfilters             1.4.2                      py_1    conda-forge
 parquet-cpp               1.5.1                         2    conda-forge
 parso                     0.5.1                      py_0    conda-forge
 partd                     1.0.0                      py_0    conda-forge
 pexpect                   4.7.0                    py37_0    conda-forge
 pickleshare               0.7.5                 py37_1000    conda-forge
 pillow                    6.1.0            py37h6b7be26_1    conda-forge
 pip                       19.2.3                   py37_0    conda-forge
 pluggy                    0.12.0                     py_0    conda-forge
 pre_commit                1.18.1                   py37_0    conda-forge
 prometheus_client         0.7.1                      py_0    conda-forge
 prompt_toolkit            2.0.9                      py_0    conda-forge
 psutil                    5.6.3            py37h516909a_0    conda-forge
 ptyprocess                0.6.0                   py_1001    conda-forge
 py                        1.8.0                      py_0    conda-forge
 pyarrow                   0.14.1           py37h8b68381_0    conda-forge
 pycodestyle               2.5.0                      py_0    conda-forge
 pycparser                 2.19                     py37_1    conda-forge
 pycryptodome              3.8.2            py37he80fd80_0    conda-forge
 pyflakes                  2.1.1                      py_0    conda-forge
 pygments                  2.4.2                      py_0    conda-forge
 pynvml                    8.0.3                    pypi_0    pypi
 pyopenssl                 19.0.0                   py37_0    conda-forge
 pyparsing                 2.4.2                      py_0    conda-forge
 pyrsistent                0.15.4           py37h516909a_0    conda-forge
 pysocks                   1.7.0                    py37_0    conda-forge
 pytest                    5.1.2                    py37_0    conda-forge
 python                    3.7.3                h33d41f4_1    conda-forge
 python-dateutil           2.8.0                      py_0    conda-forge
 python-jose               2.0.2                      py_0    conda-forge
 pytz                      2019.2                     py_0    conda-forge
 pyyaml                    5.1.2            py37h516909a_0    conda-forge
 pyzmq                     18.0.2           py37h1768529_2    conda-forge
 rapidjson                 1.1.0             he1b5a44_1002    conda-forge
 re2                       2019.09.01           he1b5a44_0    conda-forge
 readline                  8.0                  hf8c457e_0    conda-forge
 recommonmark              0.6.0                      py_0    conda-forge
 requests                  2.22.0                   py37_1    conda-forge
 responses                 0.9.0                      py_0    conda-forge
 rhash                     1.3.6             h14c3975_1001    conda-forge
 rmm                       0.9.0                    py37_0    rapidsai
 s3fs                      0.3.4                      py_0    conda-forge
 s3transfer                0.2.1                    py37_0    conda-forge
 send2trash                1.5.0                      py_0    conda-forge
 setuptools                41.2.0                   py37_0    conda-forge
 simpervisor               0.3                      pypi_0    pypi
 six                       1.12.0                py37_1000    conda-forge
 snappy                    1.1.7             he1b5a44_1002    conda-forge
 snowballstemmer           1.9.0                      py_0    conda-forge
 sortedcontainers          2.1.0                      py_0    conda-forge
 sphinx                    2.2.0                      py_0    conda-forge
 sphinx-markdown-tables    0.0.9                    pypi_0    pypi
 sphinx_rtd_theme          0.4.3                      py_0    conda-forge
 sphinxcontrib-applehelp   1.0.1                      py_0    conda-forge
 sphinxcontrib-devhelp     1.0.1                      py_0    conda-forge
 sphinxcontrib-htmlhelp    1.0.2                      py_0    conda-forge
 sphinxcontrib-jsmath      1.0.1                      py_0    conda-forge
 sphinxcontrib-qthelp      1.0.2                      py_0    conda-forge
 sphinxcontrib-serializinghtml 1.1.1                      py_0    conda-forge
 sphinxcontrib-websupport  1.1.2                      py_0    conda-forge
 sqlite                    3.29.0               hcee41ef_1    conda-forge
 tblib                     1.4.0                      py_0    conda-forge
 terminado                 0.8.2                    py37_0    conda-forge
 testpath                  0.4.2                   py_1001    conda-forge
 thrift-cpp                0.12.0            hf3afdfd_1004    conda-forge
 tk                        8.6.9             hed695b0_1002    conda-forge
 toml                      0.10.0                     py_0    conda-forge
 toolz                     0.10.0                     py_0    conda-forge
 tornado                   6.0.3            py37h516909a_0    conda-forge
 traitlets                 4.3.2                 py37_1000    conda-forge
 uriparser                 0.9.3                he1b5a44_1    conda-forge
 urllib3                   1.25.3                   py37_0    conda-forge
 virtualenv                16.7.5                     py_0    conda-forge
 wcwidth                   0.1.7                      py_1    conda-forge
 webencodings              0.5.1                      py_1    conda-forge
 websocket-client          0.56.0                   py37_0    conda-forge
 werkzeug                  0.15.5                     py_0    conda-forge
 wheel                     0.33.6                   py37_0    conda-forge
 wrapt                     1.11.2           py37h516909a_0    conda-forge
 xmltodict                 0.12.0                     py_0    conda-forge
 xz                        5.2.4             h14c3975_1001    conda-forge
 yaml                      0.1.7             h14c3975_1001    conda-forge
 yarl                      1.3.0                    pypi_0    pypi
 zeromq                    4.3.2                he1b5a44_2    conda-forge
 zict                      1.0.0                      py_0    conda-forge
 zipp                      0.6.0                      py_0    conda-forge
 zlib                      1.2.11            h516909a_1005    conda-forge
 zstd                      1.4.0                h3b9ef0a_0    conda-forge

Additional context
Although the use of cupy is trivial/unnecessary in the code snippet above, we want to use cupy in practice to avoid device-host transfers.

cc @pentschev @brandon-b-miller @quasiben

bug dask

Most helpful comment

Well, I have good news for this thread. I went to do some more debugging, and I've found this with gdb:

(gdb) t 20
[Switching to thread 20 (Thread 0x7f1070ff9700 (LWP 8248))]
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
225     ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S: No such file or directory.
(gdb) bt
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x000055e9a0ea82d8 in PyCOND_TIMEDWAIT (cond=0x55e9a10daa38 <_PyRuntime+1208>,
    mut=0x55e9a10daa68 <_PyRuntime+1256>, us=5000)
    at /home/conda/feedstock_root/build_artifacts/python_1562015400360/work/Python/condvar.h:90
#2  take_gil (tstate=0x55e9bbc4fb00)
    at /home/conda/feedstock_root/build_artifacts/python_1562015400360/work/Python/ceval_gil.h:208
#3  PyEval_RestoreThread () at /home/conda/feedstock_root/build_artifacts/python_1562015400360/work/Python/ceval.c:271
#4  0x000055e9a0f77970 in PyGILState_Ensure ()
    at /home/conda/feedstock_root/build_artifacts/python_1562015400360/work/Python/pystate.c:1067
#5  0x00007f113ccae9d7 in _CallPythonObject (pArgs=0x7f1070ff4e10, flags=4353, converters=0x7f11359c4400,
    callable=0x7f10cd3d09d8, setfunc=0x7f113cca99b0 <L_set>, restype=0x7f113ccf95a0, mem=0x7f1070ff4fa0)
    at /usr/local/src/conda/python-3.7.3/Modules/_ctypes/callbacks.c:140
#6  closure_fcn () at /usr/local/src/conda/python-3.7.3/Modules/_ctypes/callbacks.c:292
#7  0x00007f113cc983d0 in ffi_closure_unix64_inner ()
   from /home/pentschev/miniconda3/envs/rn-0.10/lib/python3.7/lib-dynload/../../libffi.so.6
#8  0x00007f113cc98798 in ffi_closure_unix64 ()
   from /home/pentschev/miniconda3/envs/rn-0.10/lib/python3.7/lib-dynload/../../libffi.so.6
#9  0x00007f1134a2c16b in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so
#10 0x00007f113493d197 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so
#11 0x00007f113493d1c0 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so
#12 0x00007f1134a8ca36 in cuOccupancyMaxPotentialBlockSize () from /usr/lib/x86_64-linux-gnu/libcuda.so
#13 0x00007f113cc98630 in ffi_call_unix64 ()
   from /home/pentschev/miniconda3/envs/rn-0.10/lib/python3.7/lib-dynload/../../libffi.so.6
#14 0x00007f113cc97fed in ffi_call ()
   from /home/pentschev/miniconda3/envs/rn-0.10/lib/python3.7/lib-dynload/../../libffi.so.6
#15 0x00007f113ccaefce in _call_function_pointer (argcount=6, resmem=0x7f1070ff5510, restype=<optimized out>,
    atypes=0x7f1070ff5490, avalues=0x7f1070ff54d0, pProc=0x7f1134a8c9b0 <cuOccupancyMaxPotentialBlockSize>, flags=4353)
    at /usr/local/src/conda/python-3.7.3/Modules/_ctypes/callproc.c:827
#16 _ctypes_callproc () at /usr/local/src/conda/python-3.7.3/Modules/_ctypes/callproc.c:1184
#17 0x00007f113ccafa04 in PyCFuncPtr_call () at /usr/local/src/conda/python-3.7.3/Modules/_ctypes/_ctypes.c:3969
...

Again cuOccupancyMaxPotentialBlockSize, which was the exact same issue from https://github.com/rapidsai/ucx-py/issues/187.

TL;DR: https://github.com/numba/numba/pull/4581 fixes this too.

All 4 comments

Well, I have good news for this thread. I went to do some more debugging, and I've found this with gdb:

(gdb) t 20
[Switching to thread 20 (Thread 0x7f1070ff9700 (LWP 8248))]
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
225     ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S: No such file or directory.
(gdb) bt
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x000055e9a0ea82d8 in PyCOND_TIMEDWAIT (cond=0x55e9a10daa38 <_PyRuntime+1208>,
    mut=0x55e9a10daa68 <_PyRuntime+1256>, us=5000)
    at /home/conda/feedstock_root/build_artifacts/python_1562015400360/work/Python/condvar.h:90
#2  take_gil (tstate=0x55e9bbc4fb00)
    at /home/conda/feedstock_root/build_artifacts/python_1562015400360/work/Python/ceval_gil.h:208
#3  PyEval_RestoreThread () at /home/conda/feedstock_root/build_artifacts/python_1562015400360/work/Python/ceval.c:271
#4  0x000055e9a0f77970 in PyGILState_Ensure ()
    at /home/conda/feedstock_root/build_artifacts/python_1562015400360/work/Python/pystate.c:1067
#5  0x00007f113ccae9d7 in _CallPythonObject (pArgs=0x7f1070ff4e10, flags=4353, converters=0x7f11359c4400,
    callable=0x7f10cd3d09d8, setfunc=0x7f113cca99b0 <L_set>, restype=0x7f113ccf95a0, mem=0x7f1070ff4fa0)
    at /usr/local/src/conda/python-3.7.3/Modules/_ctypes/callbacks.c:140
#6  closure_fcn () at /usr/local/src/conda/python-3.7.3/Modules/_ctypes/callbacks.c:292
#7  0x00007f113cc983d0 in ffi_closure_unix64_inner ()
   from /home/pentschev/miniconda3/envs/rn-0.10/lib/python3.7/lib-dynload/../../libffi.so.6
#8  0x00007f113cc98798 in ffi_closure_unix64 ()
   from /home/pentschev/miniconda3/envs/rn-0.10/lib/python3.7/lib-dynload/../../libffi.so.6
#9  0x00007f1134a2c16b in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so
#10 0x00007f113493d197 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so
#11 0x00007f113493d1c0 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so
#12 0x00007f1134a8ca36 in cuOccupancyMaxPotentialBlockSize () from /usr/lib/x86_64-linux-gnu/libcuda.so
#13 0x00007f113cc98630 in ffi_call_unix64 ()
   from /home/pentschev/miniconda3/envs/rn-0.10/lib/python3.7/lib-dynload/../../libffi.so.6
#14 0x00007f113cc97fed in ffi_call ()
   from /home/pentschev/miniconda3/envs/rn-0.10/lib/python3.7/lib-dynload/../../libffi.so.6
#15 0x00007f113ccaefce in _call_function_pointer (argcount=6, resmem=0x7f1070ff5510, restype=<optimized out>,
    atypes=0x7f1070ff5490, avalues=0x7f1070ff54d0, pProc=0x7f1134a8c9b0 <cuOccupancyMaxPotentialBlockSize>, flags=4353)
    at /usr/local/src/conda/python-3.7.3/Modules/_ctypes/callproc.c:827
#16 _ctypes_callproc () at /usr/local/src/conda/python-3.7.3/Modules/_ctypes/callproc.c:1184
#17 0x00007f113ccafa04 in PyCFuncPtr_call () at /usr/local/src/conda/python-3.7.3/Modules/_ctypes/_ctypes.c:3969
...

Again cuOccupancyMaxPotentialBlockSize, which was the exact same issue from https://github.com/rapidsai/ucx-py/issues/187.

TL;DR: https://github.com/numba/numba/pull/4581 fixes this too.

@pentschev thanks so much for tracking this down.

Thanks @pentschev! You are my hero :)

I'd just like to quote @pentschev's key comment from the discussion in ucx-py#187:

To describe briefly, the problem is the numba.forall call, which internally calls cuOccupancyMaxPotentialBlockSize. This last function requires two function pointers, one being the CUDA kernel itself, and the other being a function to calculate how much shared memory the call requires. The problem lies in the latter, which is defined in https://github.com/numba/numba/blob/master/numba/cuda/compiler.py#L288. Since that is a Python lambda function, when cuOccupancyMaxPotentialBlockSize calls that function back, it tries to acquire the GIL, which causes a deadlock (as both the thread executing cuOccupancyMaxPotentialBlockSize and the thread executing cudaMemcpyAsync lock the same CUDA mutex). The GIL can then never be acquires since both threads can never complete.

What we need to prevent is that CUDA calls (e.g., function callbacks passed to libcuda) never tries to acquire the GIL. To fix that in the present case, we can simply pass a C function pointer instead of passing a Python function to it.

I will close this issue since the discussion already has a "home", and there is now a numba PR/fix.

Was this page helpful?
0 / 5 - 0 ratings