Cudf: [BUG] dask-cudf groupby with std results in TypeError

Created on 20 Nov 2019  路  7Comments  路  Source: rapidsai/cudf

Describe the bug

Attempting std() aggregation on dask-cudf dataframe results in the following error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/utils.py in raise_on_meta_error(funcname, udf)
    168     try:
--> 169         yield
    170     except Exception as e:
/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/core.py in _emulate(func, *args, **kwargs)
   4740     with raise_on_meta_error(funcname(func), udf=kwargs.pop("udf", False)):
-> 4741         return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
   4742 
/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/groupby.py in _var_chunk(df, *index)
    307 
--> 308     df[cols] = df[cols] ** 2
    309     g2 = _groupby_raise_unaligned(df, by=index)
/conda/envs/rapids/lib/python3.6/site-packages/cudf/core/dataframe.py in __setitem__(self, name, col)
    439 
--> 440         elif name in self._cols:
    441             self._cols[name] = self._prepare_series_for_add(col)
/conda/envs/rapids/lib/python3.6/site-packages/pandas/core/indexes/base.py in __hash__(self)
   3934     def __hash__(self):
-> 3935         raise TypeError("unhashable type: %r" % type(self).__name__)
   3936 
TypeError: unhashable type: 'Index'
During handling of the above exception, another exception occurred:
ValueError                                Traceback (most recent call last)
<ipython-input-60-3951b9cba7f8> in <module>
----> 1 ddf.groupby(['a']).b.std()
/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/groupby.py in std(self, ddof, split_every, split_out)
   1358     @derived_from(pd.core.groupby.GroupBy)
   1359     def std(self, ddof=1, split_every=None, split_out=1):
-> 1360         v = self.var(ddof, split_every=split_every, split_out=split_out)
   1361         result = map_partitions(np.sqrt, v, meta=v)
   1362         return result
/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/groupby.py in var(self, ddof, split_every, split_out)
   1346             split_every=split_every,
   1347             split_out=split_out,
-> 1348             split_out_setup=split_out_on_index,
   1349         )
   1350 
/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/core.py in apply_concat_apply(args, chunk, aggregate, combine, meta, token, chunk_kwargs, aggregate_kwargs, combine_kwargs, split_every, split_out, split_out_setup, split_out_setup_kwargs, **kwargs)
   4694 
   4695     if meta is no_default:
-> 4696         meta_chunk = _emulate(chunk, *args, udf=True, **chunk_kwargs)
   4697         meta = _emulate(aggregate, _concat([meta_chunk]), udf=True, **aggregate_kwargs)
   4698     meta = make_meta(
/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/core.py in _emulate(func, *args, **kwargs)
   4739     """
   4740     with raise_on_meta_error(funcname(func), udf=kwargs.pop("udf", False)):
-> 4741         return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
   4742 
   4743 
/conda/envs/rapids/lib/python3.6/contextlib.py in __exit__(self, type, value, traceback)
     97                 value = type()
     98             try:
---> 99                 self.gen.throw(type, value, traceback)
    100             except StopIteration as exc:
    101                 # Suppress StopIteration *unless* it's the same exception that
/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/utils.py in raise_on_meta_error(funcname, udf)
    188         )
    189         msg = msg.format(" in `{0}`".format(funcname) if funcname else "", repr(e), tb)
--> 190         raise ValueError(msg)
    191 
    192 
ValueError: Metadata inference failed in `_var_chunk`.
You have supplied a custom function and Dask is unable to 
determine the type of output that that function returns. 
To resolve this please provide a meta= keyword.
The docstring of the Dask function you ran should have more information.
Original error is below:
------------------------
TypeError("unhashable type: 'Index'",)
Traceback:
---------
  File "/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/utils.py", line 169, in raise_on_meta_error
    yield
  File "/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/core.py", line 4741, in _emulate
    return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
  File "/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/groupby.py", line 308, in _var_chunk
    df[cols] = df[cols] ** 2
  File "/conda/envs/rapids/lib/python3.6/site-packages/cudf/core/dataframe.py", line 440, in __setitem__
    elif name in self._cols:
  File "/conda/envs/rapids/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3935, in __hash__
    raise TypeError("unhashable type: %r" % type(self).__name__)

Steps/Code to reproduce bug

import cudf
import dask_cudf
df = cudf.DataFrame({'a': [1.,3,4,1.],'b': [4.,5,6,-10], 'c': [6., 7., 5., 10.]})
ddf = dask_cudf.from_cudf(df, npartitions=2)
ddf.groupby(['a']).b.std()

Expected behavior
Should return the standard deviation of Series b based on the groups.

Environment overview (please complete the following information)

  • Environment location: Docker
  • Method of cuDF install: From Source (0.11)

Environment details


     **git***
     commit 2eab329e0474a2fa1156ebe1193b806516e8a0f7 (HEAD -> branch-0.11, origin/branch-0.11, origin/HEAD)
     Merge: 582be6d a9d8d55
     Author: Mark Harris <[email protected]>
     Date:   Tue Nov 19 08:30:04 2019 +1100

     Merge pull request #3232 from davidwendt/port-nvs-datetime-ops

     [REVIEW] Port NVStrings datetime conversion ops to cudf strings column
     **git submodules***
     b165e1fb11eeea64ccf95053e40f2424312599cc thirdparty/cub (v1.7.1)
     63f644be44201467e3938d59ed9d89cc8725c35d thirdparty/jitify (remotes/origin/feature/api_v2_v0.10)
     +39125e0e476b960c2001f1ec76a3441335ff91b2 thirdparty/libcudacxx (0.8.1-94-g39125e0)
     08bc464bd8f4d779e4294305aa7dadebcebcc507 thirdparty/libcudacxx/libcxx (heads/rapidsai-interop)

     ***OS Information***
     DISTRIB_ID=Ubuntu
     DISTRIB_RELEASE=16.04
     DISTRIB_CODENAME=xenial
     DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"
     NAME="Ubuntu"
     VERSION="16.04.6 LTS (Xenial Xerus)"
     ID=ubuntu
     ID_LIKE=debian
     PRETTY_NAME="Ubuntu 16.04.6 LTS"
     VERSION_ID="16.04"
     HOME_URL="http://www.ubuntu.com/"
     SUPPORT_URL="http://help.ubuntu.com/"
     BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
     VERSION_CODENAME=xenial
     UBUNTU_CODENAME=xenial
     Linux 66dcd7e53678 4.4.0-154-generic #181-Ubuntu SMP Tue Jun 25 05:29:03 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

     ***GPU Information***
     Wed Nov 20 20:38:32 2019
     +-----------------------------------------------------------------------------+
     | NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
     |-------------------------------+----------------------+----------------------+
     | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
     | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
     |===============================+======================+======================|
     |   0  Tesla T4            Off  | 00000000:3B:00.0 Off |                    0 |
     | N/A   52C    P0    29W /  70W |   1293MiB / 15079MiB |      0%      Default |
     +-------------------------------+----------------------+----------------------+
     |   1  Tesla T4            Off  | 00000000:5E:00.0 Off |                    0 |
     | N/A   59C    P0    28W /  70W |    279MiB / 15079MiB |      0%      Default |
     +-------------------------------+----------------------+----------------------+
     |   2  Tesla T4            Off  | 00000000:AF:00.0 Off |                    0 |
     | N/A   50C    P0    29W /  70W |    279MiB / 15079MiB |      0%      Default |
     +-------------------------------+----------------------+----------------------+
     |   3  Tesla T4            Off  | 00000000:D8:00.0 Off |                    0 |
     | N/A   49C    P0    29W /  70W |    279MiB / 15079MiB |      0%      Default |
     +-------------------------------+----------------------+----------------------+

     +-----------------------------------------------------------------------------+
     | Processes:                                                       GPU Memory |
     |  GPU       PID   Type   Process name                             Usage      |
     |=============================================================================|
     +-----------------------------------------------------------------------------+

     ***CPU***
     Architecture:          x86_64
     CPU op-mode(s):        32-bit, 64-bit
     Byte Order:            Little Endian
     CPU(s):                64
     On-line CPU(s) list:   0-63
     Thread(s) per core:    2
     Core(s) per socket:    16
     Socket(s):             2
     NUMA node(s):          2
     Vendor ID:             GenuineIntel
     CPU family:            6
     Model:                 85
     Model name:            Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
     Stepping:              4
     CPU MHz:               2095.092
     BogoMIPS:              4191.35
     Virtualization:        VT-x
     L1d cache:             32K
     L1i cache:             32K
     L2 cache:              1024K
     L3 cache:              22528K
     NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62
     NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63
     Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt ssbd ibrs ibpb stibp kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku md_clear flush_l1d

     ***CMake***
     /conda/envs/rapids/bin/cmake
     cmake version 3.15.5

     CMake suite maintained and supported by Kitware (kitware.com/cmake).

     ***g++***
     /usr/bin/g++
     g++ (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
     Copyright (C) 2015 Free Software Foundation, Inc.
     This is free software; see the source for copying conditions.  There is NO
     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


     ***nvcc***
     /usr/local/cuda/bin/nvcc
     nvcc: NVIDIA (R) Cuda compiler driver
     Copyright (c) 2005-2018 NVIDIA Corporation
     Built on Sat_Aug_25_21:08:01_CDT_2018
     Cuda compilation tools, release 10.0, V10.0.130

     ***Python***
     /conda/envs/rapids/bin/python
     Python 3.6.7

     ***Environment Variables***
     PATH                            : /conda/envs/rapids/bin:/conda/condabin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/conda/bin:/conda/bin
     LD_LIBRARY_PATH                 : /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64
     NUMBAPRO_NVVM                   : /usr/local/cuda/nvvm/lib64/libnvvm.so
     NUMBAPRO_LIBDEVICE              : /usr/local/cuda/nvvm/libdevice
     CONDA_PREFIX                    : /conda/envs/rapids
     PYTHON_PATH                     :

     ***conda packages***
     /conda/condabin/conda
     # packages in environment at /conda/envs/rapids:
     #
     # Name                    Version                   Build  Channel
     _libgcc_mutex             0.1                        main
     alabaster                 0.7.12                     py_0    conda-forge
     appdirs                   1.4.3                      py_1    conda-forge
     arrow-cpp                 0.15.0           py36h090bef1_2    conda-forge
     aspy.yaml                 1.3.0                      py_0    conda-forge
     atomicwrites              1.3.0                      py_0    conda-forge
     attrs                     19.3.0                     py_0    conda-forge
     babel                     2.7.0                      py_0    conda-forge
     backcall                  0.1.0                      py_0    conda-forge
     black                     19.10b0                  py36_0    conda-forge
     blas                      2.14                        mkl    conda-forge
     bleach                    3.1.0                      py_0    conda-forge
     bokeh                     1.2.0                    py36_0    conda-forge
     boost-cpp                 1.70.0               h8e57a91_2    conda-forge
     brotli                    1.0.7             he1b5a44_1000    conda-forge
     bzip2                     1.0.8                h516909a_1    conda-forge
     c-ares                    1.15.0            h516909a_1001    conda-forge
     ca-certificates           2019.9.11            hecc5488_0    conda-forge
     cached-property           1.5.1                      py_0    conda-forge
     certifi                   2019.9.11                py36_0    conda-forge
     cffi                      1.13.2           py36h8022711_0    conda-forge
     cfgv                      2.0.1                      py_0    conda-forge
     chardet                   3.0.4                 py36_1003    conda-forge
     click                     7.0                      pypi_0    pypi
     cloudpickle               1.2.2                      py_1    conda-forge
     cmake                     3.15.5               hf94ab9c_0    conda-forge
     cmake_setuptools          0.1.3                      py_0    rapidsai-nightly/label/cuda10.0
     commonmark                0.9.1                      py_0    conda-forge
     cryptography              2.8              py36h72c5cf5_0    conda-forge
     cuda100                   1.0                           0    pytorch
     cudatoolkit               10.0.130                      0    nvidia
     cudf                      0.9.0a0+1525.gd3f19ca          pypi_0    pypi
     cudnn                     7.6.0                cuda10.0_0    nvidia
     cugraph                   0.9.0a1+40.ge038d75          pypi_0    pypi
     cuml                      0.9.0a1+219.gc31e78e.dirty          pypi_0    pypi
     cupy                      6.3.0            py36h88562e5_0    rapidsai
     curl                      7.65.3               hf8cf82a_0    conda-forge
     cycler                    0.10.0                     py_2    conda-forge
     cython                    0.29.14          py36he1b5a44_0    conda-forge
     cytoolz                   0.10.1           py36h516909a_0    conda-forge
     dask                      2.8.0+1.g7967652          pypi_0    pypi
     dask-cudf                 0.11.0a0+2889.g582be6d.dirty          pypi_0    pypi
     dask-cuml                 0.9.0a0                  pypi_0    pypi
     dask-xgboost              0.1.5                    pypi_0    pypi
     dataclasses               0.7                      py36_0    conda-forge
     dbus                      1.13.6               he372182_0    conda-forge
     decorator                 4.4.1                      py_0    conda-forge
     defusedxml                0.6.0                      py_0    conda-forge
     distributed               2.8.0+4.g029ed17          pypi_0    pypi
     dlpack                    0.2                  he1b5a44_1    conda-forge
     docutils                  0.15.2                   py36_0    conda-forge
     double-conversion         3.1.5                he1b5a44_2    conda-forge
     editdistance              0.5.3            py36he1b5a44_0    conda-forge
     entrypoints               0.3                   py36_1000    conda-forge
     expat                     2.2.5             he1b5a44_1004    conda-forge
     faiss-gpu                 1.5.0           py36_cuda10.0_1  [cuda100]  pytorch
     fastavro                  0.22.7           py36h516909a_0    conda-forge
     fastrlock                 0.4             py36he1b5a44_1000    conda-forge
     flake8                    3.7.9                    py36_0    conda-forge
     flatbuffers               1.11.0               he1b5a44_0    conda-forge
     fontconfig                2.13.1            h86ecdb6_1001    conda-forge
     freetype                  2.10.0               he983fc9_1    conda-forge
     fsspec                    0.6.0                      py_0    conda-forge
     future                    0.18.2                   py36_0    conda-forge
     gettext                   0.19.8.1          hc5be6a0_1002    conda-forge
     gflags                    2.2.2             he1b5a44_1002    conda-forge
     glib                      2.58.3          py36h6f030ca_1002    conda-forge
     glog                      0.4.0                he1b5a44_1    conda-forge
     gmp                       6.1.2             hf484d3e_1000    conda-forge
     grpc-cpp                  1.23.0               h18db393_0    conda-forge
     gst-plugins-base          1.14.5               h0935bb2_0    conda-forge
     gstreamer                 1.14.5               h36ae1b5_0    conda-forge
     heapdict                  1.0.1                      py_0    conda-forge
     hypothesis                4.44.2                   py36_0    conda-forge
     icu                       64.2                 he1b5a44_1    conda-forge
     identify                  1.4.7                      py_0    conda-forge
     idna                      2.8                   py36_1000    conda-forge
     imagesize                 1.1.0                      py_0    conda-forge
     importlib_metadata        0.23                     py36_0    conda-forge
     importlib_resources       1.0.2                 py36_1000    conda-forge
     intel-openmp              2019.4                      243
     ipykernel                 5.1.3            py36h5ca1d4c_0    conda-forge
     ipython                   7.9.0            py36h5ca1d4c_0    conda-forge
     ipython_genutils          0.2.0                      py_1    conda-forge
     ipywidgets                7.5.1                    pypi_0    pypi
     isort                     4.3.21                   py36_0    conda-forge
     jedi                      0.15.1                   py36_0    conda-forge
     jinja2                    2.10.3                     py_0    conda-forge
     joblib                    0.14.0                     py_0    conda-forge
     jpeg                      9c                h14c3975_1001    conda-forge
     json5                     0.8.5                      py_0    conda-forge
     jsonschema                3.2.0                    py36_0    conda-forge
     jupyter                   1.0.0                    pypi_0    pypi
     jupyter-console           6.0.0                    pypi_0    pypi
     jupyter_client            5.3.3                    py36_1    conda-forge
     jupyter_core              4.6.1                    py36_0    conda-forge
     jupyterlab                1.0.2                    py36_0    conda-forge
     jupyterlab_server         1.0.6                      py_0    conda-forge
     kiwisolver                1.1.0            py36hc9558a2_0    conda-forge
     krb5                      1.16.3            h05b26f9_1001    conda-forge
     libblas                   3.8.0                    14_mkl    conda-forge
     libcblas                  3.8.0                    14_mkl    conda-forge
     libclang                  8.0.0                h6bb024c_0    rapidsai/label/cuda10.0
     libcumlmg                 0.0.0.dev0         cuda10.0_373    nvidia/label/cuda10.0
     libcurl                   7.65.3               hda55be3_0    conda-forge
     libedit                   3.1.20170329      hf8c457e_1001    conda-forge
     libevent                  2.1.10               h72c5cf5_0    conda-forge
     libffi                    3.2.1             he1b5a44_1006    conda-forge
     libgcc-ng                 9.1.0                hdf63c60_0
     libgfortran-ng            7.3.0                hdf63c60_2    conda-forge
     libiconv                  1.15              h516909a_1005    conda-forge
     liblapack                 3.8.0                    14_mkl    conda-forge
     liblapacke                3.8.0                    14_mkl    conda-forge
     libllvm8                  8.0.1                hc9558a2_0    conda-forge
     libopenblas               0.3.6                h6e990d7_4    conda-forge
     libpng                    1.6.37               hed695b0_0    conda-forge
     libprotobuf               3.8.0                h8b12597_0    conda-forge
     librmm                    0.11.0b191118       cuda10.0_40    rapidsai-nightly
     librmm-cffi               0.9.0                    pypi_0    pypi
     libsodium                 1.0.17               h516909a_0    conda-forge
     libssh2                   1.8.2                h22169c7_2    conda-forge
     libstdcxx-ng              9.1.0                hdf63c60_0
     libtiff                   4.1.0                hfc65ed5_0    conda-forge
     libuuid                   2.32.1            h14c3975_1000    conda-forge
     libuv                     1.33.1               h516909a_0    conda-forge
     libxcb                    1.13              h14c3975_1002    conda-forge
     libxml2                   2.9.10               hee79883_0    conda-forge
     llvmlite                  0.30.0           py36h8b12597_1    conda-forge
     locket                    0.2.0                      py_2    conda-forge
     lz4-c                     1.8.3             he1b5a44_1001    conda-forge
     markdown                  3.0.1                    pypi_0    pypi
     markupsafe                1.1.1            py36h516909a_0    conda-forge
     matplotlib                3.1.2                    py36_1    conda-forge
     matplotlib-base           3.1.2            py36h250f245_1    conda-forge
     mccabe                    0.6.1                      py_1    conda-forge
     mistune                   0.8.4           py36h516909a_1000    conda-forge
     mkl                       2019.4                      243
     more-itertools            7.2.0                      py_0    conda-forge
     msgpack-python            0.6.2            py36hc9558a2_0    conda-forge
     mypy_extensions           0.4.3                    py36_0    conda-forge
     nbconvert                 5.6.1                    py36_0    conda-forge
     nbformat                  4.4.0                      py_1    conda-forge
     nbsphinx                  0.4.3                      py_0    conda-forge
     nccl                      2.4.6.1              cuda10.0_0    nvidia
     ncurses                   6.1               hf484d3e_1002    conda-forge
     networkx                  2.3                        py_0    conda-forge
     nodeenv                   1.3.3                      py_0    conda-forge
     notebook                  6.0.1                    py36_0    conda-forge
     numba                     0.46.0           py36hb3f55d8_1    conda-forge
     numpy                     1.16.2           py36h8b7e671_1    conda-forge
     numpydoc                  0.9.1                      py_0    conda-forge
     nvstrings-cuda100         0.0.0.dev0               pypi_0    pypi
     olefile                   0.46                       py_0    conda-forge
     openblas                  0.3.6                h6e990d7_4    conda-forge
     openssl                   1.1.1d               h516909a_0    conda-forge
     packaging                 19.2                       py_0    conda-forge
     pandas                    0.24.2           py36hb3f55d8_0    conda-forge
     pandoc                    1.19.2                        0    conda-forge
     pandocfilters             1.4.2                      py_1    conda-forge
     parquet-cpp               1.5.1                         2    conda-forge
     parso                     0.5.1                      py_0    conda-forge
     partd                     1.0.0                      py_0    conda-forge
     pathspec                  0.6.0                      py_0    conda-forge
     patsy                     0.5.1                      py_0    conda-forge
     pcre                      8.43                 he1b5a44_0    conda-forge
     pexpect                   4.7.0                    py36_0    conda-forge
     pickleshare               0.7.5                 py36_1000    conda-forge
     pillow                    6.2.1            py36h6b7be26_0    conda-forge
     pip                       19.3.1                   py36_0    conda-forge
     pluggy                    0.12.0                     py_0    conda-forge
     pre_commit                1.18.1                   py36_0    conda-forge
     prometheus_client         0.7.1                      py_0    conda-forge
     prompt_toolkit            2.0.10                     py_0    conda-forge
     psutil                    5.6.5            py36h516909a_0    conda-forge
     pthread-stubs             0.4               h14c3975_1001    conda-forge
     ptyprocess                0.6.0                   py_1001    conda-forge
     py                        1.8.0                      py_0    conda-forge
     pyarrow                   0.15.0           py36h8b68381_1    conda-forge
     pycodestyle               2.5.0                      py_0    conda-forge
     pycparser                 2.19                     py36_1    conda-forge
     pyflakes                  2.1.1                      py_0    conda-forge
     pygments                  2.4.2                      py_0    conda-forge
     pynvml                    8.0.3                    pypi_0    pypi
     pyopenssl                 19.0.0                   py36_0    conda-forge
     pyparsing                 2.4.5                      py_0    conda-forge
     pyqt                      5.9.2            py36hcca6a23_4    conda-forge
     pyrsistent                0.15.5           py36h516909a_0    conda-forge
     pysocks                   1.7.1                    py36_0    conda-forge
     pytest                    5.2.4                    py36_0    conda-forge
     python                    3.6.7             h357f687_1006    conda-forge
     python-dateutil           2.8.1                      py_0    conda-forge
     pytz                      2019.3                     py_0    conda-forge
     pyyaml                    5.1.2            py36h516909a_0    conda-forge
     pyzmq                     18.1.1           py36h1768529_0    conda-forge
     qt                        5.9.7                h0c104cb_3    conda-forge
     qtconsole                 4.6.0                    pypi_0    pypi
     rapidjson                 1.1.0             he1b5a44_1002    conda-forge
     re2                       2019.11.01           he1b5a44_0    conda-forge
     readline                  8.0                  hf8c457e_0    conda-forge
     recommonmark              0.6.0                      py_0    conda-forge
     regex                     2019.11.1        py36h516909a_0    conda-forge
     requests                  2.22.0                   py36_1    conda-forge
     rhash                     1.3.6             h14c3975_1001    conda-forge
     rmm                       0.11.0b191118           py36_40    rapidsai-nightly
     scikit-learn              0.21.3           py36hcdab131_0    conda-forge
     scipy                     1.3.0            py36h921218d_0    conda-forge
     seaborn                   0.9.0                      py_0    conda-forge
     send2trash                1.5.0                      py_0    conda-forge
     setuptools                41.6.0                   py36_1    conda-forge
     sip                       4.19.8          py36hf484d3e_1000    conda-forge
     six                       1.13.0                   py36_0    conda-forge
     snappy                    1.1.7             he1b5a44_1002    conda-forge
     snowballstemmer           2.0.0                      py_0    conda-forge
     sortedcontainers          2.1.0                      py_0    conda-forge
     sphinx                    2.2.1                      py_0    conda-forge
     sphinx-markdown-tables    0.0.10                   pypi_0    pypi
     sphinx_rtd_theme          0.4.3                      py_0    conda-forge
     sphinxcontrib-applehelp   1.0.1                      py_0    conda-forge
     sphinxcontrib-devhelp     1.0.1                      py_0    conda-forge
     sphinxcontrib-htmlhelp    1.0.2                      py_0    conda-forge
     sphinxcontrib-jsmath      1.0.1                      py_0    conda-forge
     sphinxcontrib-qthelp      1.0.2                      py_0    conda-forge
     sphinxcontrib-serializinghtml 1.1.3                      py_0    conda-forge
     sphinxcontrib-websupport  1.1.2                      py_0    conda-forge
     sqlite                    3.30.1               hcee41ef_0    conda-forge
     statsmodels               0.10.1           py36hc1659b7_2    conda-forge
     streamz                   0.5.2                    pypi_0    pypi
     tblib                     1.4.0                      py_0    conda-forge
     terminado                 0.8.3                    py36_0    conda-forge
     testpath                  0.4.4                      py_0    conda-forge
     thrift-cpp                0.12.0            hf3afdfd_1004    conda-forge
     tk                        8.6.9             hed695b0_1003    conda-forge
     toml                      0.10.0                     py_0    conda-forge
     toolz                     0.10.0                     py_0    conda-forge
     tornado                   6.0.3            py36h516909a_0    conda-forge
     traitlets                 4.3.3                    py36_0    conda-forge
     typed-ast                 1.4.0            py36h516909a_0    conda-forge
     typing_extensions         3.7.4.1                  py36_0    conda-forge
     uriparser                 0.9.3                he1b5a44_1    conda-forge
     urllib3                   1.25.7                   py36_0    conda-forge
     virtualenv                16.7.5                     py_0    conda-forge
     wcwidth                   0.1.7                      py_1    conda-forge
     webencodings              0.5.1                      py_1    conda-forge
     wheel                     0.33.6                   py36_0    conda-forge
     widgetsnbextension        3.5.1                    pypi_0    pypi
     xgboost                   0.90.rapidsdev1          pypi_0    pypi
     xorg-libxau               1.0.9                h14c3975_0    conda-forge
     xorg-libxdmcp             1.1.3                h516909a_0    conda-forge
     xz                        5.2.4             h14c3975_1001    conda-forge
     yaml                      0.1.7             h14c3975_1001    conda-forge
     zeromq                    4.3.2                he1b5a44_2    conda-forge
     zict                      1.0.0                    pypi_0    pypi
     zipp                      0.6.0                      py_0    conda-forge
     zlib                      1.2.11            h516909a_1006    conda-forge
     zstd                      1.4.3                h3b9ef0a_0    conda-forge
? - Needs Triage bug

Most helpful comment

Can confirm that with the pr #3442 , below works:

import cudf
import dask_cudf
df = cudf.DataFrame({'a': [1.,3,4,1.],'b': [4.,5,6,-10], 'c': [6., 7., 5., 10.]})
ddf = dask_cudf.from_cudf(df, npartitions=2)
ddf.groupby(['a']).b.std().compute()
a
1.0    9.899494937
4.0           null
3.0           null
Name: b, dtype: float64

All 7 comments

I debugged this a bit more and the root cause seems to be cudf.DataFrame.__setitem__ being unable to set values if the key columns is a list, index, any mutable python object.

Minimal reproducer:

df = cudf.DataFrame({'a': [1.,3,4,1.],'b': [4.,5,6,-10], 'c': [6., 7., 5., 10.]})

df[df.columns] = 2 # error
df[['a','b','c']] = 2 # error 
df[['a']] = 2 # error
df['a'] = 2 #Works fine
--> 440         elif name in self._cols:
    441             self._cols[name] = self._prepare_series_for_add(col)
    442         else:

TypeError: unhashable type: 'list'

One solution could be to check if key is a list/index/iterable to iterate through each element of the key and check if it exists in self._cols and add accordingly.

This is similar to https://github.com/rapidsai/cudf/issues/2758,

I have a pr in progress https://github.com/rapidsai/cudf/pull/3125 which slipped due to other priorities.

Will pick it up and drive it home asap.

cc @kkraus14 @shwina just for visibility

Great work on the triage. Seems like you've got a handle to build the fix, but let me know if I can jump in and help.

Can confirm that with the pr #3442 , below works:

import cudf
import dask_cudf
df = cudf.DataFrame({'a': [1.,3,4,1.],'b': [4.,5,6,-10], 'c': [6., 7., 5., 10.]})
ddf = dask_cudf.from_cudf(df, npartitions=2)
ddf.groupby(['a']).b.std().compute()
a
1.0    9.899494937
4.0           null
3.0           null
Name: b, dtype: float64

@Nanthini10 , Can you try this again ?

@Nanthini10 , Can you try this again ?

Can confirm this works with the https://github.com/rapidsai/cudf/pull/3442 merge. Thank you!

Was this page helpful?
0 / 5 - 0 ratings