Describe the bug
Attempting std() aggregation on dask-cudf dataframe results in the following error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/utils.py in raise_on_meta_error(funcname, udf)
168 try:
--> 169 yield
170 except Exception as e:
/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/core.py in _emulate(func, *args, **kwargs)
4740 with raise_on_meta_error(funcname(func), udf=kwargs.pop("udf", False)):
-> 4741 return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
4742
/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/groupby.py in _var_chunk(df, *index)
307
--> 308 df[cols] = df[cols] ** 2
309 g2 = _groupby_raise_unaligned(df, by=index)
/conda/envs/rapids/lib/python3.6/site-packages/cudf/core/dataframe.py in __setitem__(self, name, col)
439
--> 440 elif name in self._cols:
441 self._cols[name] = self._prepare_series_for_add(col)
/conda/envs/rapids/lib/python3.6/site-packages/pandas/core/indexes/base.py in __hash__(self)
3934 def __hash__(self):
-> 3935 raise TypeError("unhashable type: %r" % type(self).__name__)
3936
TypeError: unhashable type: 'Index'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-60-3951b9cba7f8> in <module>
----> 1 ddf.groupby(['a']).b.std()
/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/groupby.py in std(self, ddof, split_every, split_out)
1358 @derived_from(pd.core.groupby.GroupBy)
1359 def std(self, ddof=1, split_every=None, split_out=1):
-> 1360 v = self.var(ddof, split_every=split_every, split_out=split_out)
1361 result = map_partitions(np.sqrt, v, meta=v)
1362 return result
/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/groupby.py in var(self, ddof, split_every, split_out)
1346 split_every=split_every,
1347 split_out=split_out,
-> 1348 split_out_setup=split_out_on_index,
1349 )
1350
/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/core.py in apply_concat_apply(args, chunk, aggregate, combine, meta, token, chunk_kwargs, aggregate_kwargs, combine_kwargs, split_every, split_out, split_out_setup, split_out_setup_kwargs, **kwargs)
4694
4695 if meta is no_default:
-> 4696 meta_chunk = _emulate(chunk, *args, udf=True, **chunk_kwargs)
4697 meta = _emulate(aggregate, _concat([meta_chunk]), udf=True, **aggregate_kwargs)
4698 meta = make_meta(
/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/core.py in _emulate(func, *args, **kwargs)
4739 """
4740 with raise_on_meta_error(funcname(func), udf=kwargs.pop("udf", False)):
-> 4741 return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
4742
4743
/conda/envs/rapids/lib/python3.6/contextlib.py in __exit__(self, type, value, traceback)
97 value = type()
98 try:
---> 99 self.gen.throw(type, value, traceback)
100 except StopIteration as exc:
101 # Suppress StopIteration *unless* it's the same exception that
/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/utils.py in raise_on_meta_error(funcname, udf)
188 )
189 msg = msg.format(" in `{0}`".format(funcname) if funcname else "", repr(e), tb)
--> 190 raise ValueError(msg)
191
192
ValueError: Metadata inference failed in `_var_chunk`.
You have supplied a custom function and Dask is unable to
determine the type of output that that function returns.
To resolve this please provide a meta= keyword.
The docstring of the Dask function you ran should have more information.
Original error is below:
------------------------
TypeError("unhashable type: 'Index'",)
Traceback:
---------
File "/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/utils.py", line 169, in raise_on_meta_error
yield
File "/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/core.py", line 4741, in _emulate
return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
File "/conda/envs/rapids/lib/python3.6/site-packages/dask/dataframe/groupby.py", line 308, in _var_chunk
df[cols] = df[cols] ** 2
File "/conda/envs/rapids/lib/python3.6/site-packages/cudf/core/dataframe.py", line 440, in __setitem__
elif name in self._cols:
File "/conda/envs/rapids/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3935, in __hash__
raise TypeError("unhashable type: %r" % type(self).__name__)
Steps/Code to reproduce bug
import cudf
import dask_cudf
df = cudf.DataFrame({'a': [1.,3,4,1.],'b': [4.,5,6,-10], 'c': [6., 7., 5., 10.]})
ddf = dask_cudf.from_cudf(df, npartitions=2)
ddf.groupby(['a']).b.std()
Expected behavior
Should return the standard deviation of Series b based on the groups.
Environment overview (please complete the following information)
Environment details
**git***
commit 2eab329e0474a2fa1156ebe1193b806516e8a0f7 (HEAD -> branch-0.11, origin/branch-0.11, origin/HEAD)
Merge: 582be6d a9d8d55
Author: Mark Harris <[email protected]>
Date: Tue Nov 19 08:30:04 2019 +1100
Merge pull request #3232 from davidwendt/port-nvs-datetime-ops
[REVIEW] Port NVStrings datetime conversion ops to cudf strings column
**git submodules***
b165e1fb11eeea64ccf95053e40f2424312599cc thirdparty/cub (v1.7.1)
63f644be44201467e3938d59ed9d89cc8725c35d thirdparty/jitify (remotes/origin/feature/api_v2_v0.10)
+39125e0e476b960c2001f1ec76a3441335ff91b2 thirdparty/libcudacxx (0.8.1-94-g39125e0)
08bc464bd8f4d779e4294305aa7dadebcebcc507 thirdparty/libcudacxx/libcxx (heads/rapidsai-interop)
***OS Information***
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
Linux 66dcd7e53678 4.4.0-154-generic #181-Ubuntu SMP Tue Jun 25 05:29:03 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
***GPU Information***
Wed Nov 20 20:38:32 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:3B:00.0 Off | 0 |
| N/A 52C P0 29W / 70W | 1293MiB / 15079MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:5E:00.0 Off | 0 |
| N/A 59C P0 28W / 70W | 279MiB / 15079MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla T4 Off | 00000000:AF:00.0 Off | 0 |
| N/A 50C P0 29W / 70W | 279MiB / 15079MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla T4 Off | 00000000:D8:00.0 Off | 0 |
| N/A 49C P0 29W / 70W | 279MiB / 15079MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
***CPU***
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
Stepping: 4
CPU MHz: 2095.092
BogoMIPS: 4191.35
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 22528K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt ssbd ibrs ibpb stibp kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku md_clear flush_l1d
***CMake***
/conda/envs/rapids/bin/cmake
cmake version 3.15.5
CMake suite maintained and supported by Kitware (kitware.com/cmake).
***g++***
/usr/bin/g++
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
***nvcc***
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
***Python***
/conda/envs/rapids/bin/python
Python 3.6.7
***Environment Variables***
PATH : /conda/envs/rapids/bin:/conda/condabin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/conda/bin:/conda/bin
LD_LIBRARY_PATH : /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64
NUMBAPRO_NVVM : /usr/local/cuda/nvvm/lib64/libnvvm.so
NUMBAPRO_LIBDEVICE : /usr/local/cuda/nvvm/libdevice
CONDA_PREFIX : /conda/envs/rapids
PYTHON_PATH :
***conda packages***
/conda/condabin/conda
# packages in environment at /conda/envs/rapids:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
alabaster 0.7.12 py_0 conda-forge
appdirs 1.4.3 py_1 conda-forge
arrow-cpp 0.15.0 py36h090bef1_2 conda-forge
aspy.yaml 1.3.0 py_0 conda-forge
atomicwrites 1.3.0 py_0 conda-forge
attrs 19.3.0 py_0 conda-forge
babel 2.7.0 py_0 conda-forge
backcall 0.1.0 py_0 conda-forge
black 19.10b0 py36_0 conda-forge
blas 2.14 mkl conda-forge
bleach 3.1.0 py_0 conda-forge
bokeh 1.2.0 py36_0 conda-forge
boost-cpp 1.70.0 h8e57a91_2 conda-forge
brotli 1.0.7 he1b5a44_1000 conda-forge
bzip2 1.0.8 h516909a_1 conda-forge
c-ares 1.15.0 h516909a_1001 conda-forge
ca-certificates 2019.9.11 hecc5488_0 conda-forge
cached-property 1.5.1 py_0 conda-forge
certifi 2019.9.11 py36_0 conda-forge
cffi 1.13.2 py36h8022711_0 conda-forge
cfgv 2.0.1 py_0 conda-forge
chardet 3.0.4 py36_1003 conda-forge
click 7.0 pypi_0 pypi
cloudpickle 1.2.2 py_1 conda-forge
cmake 3.15.5 hf94ab9c_0 conda-forge
cmake_setuptools 0.1.3 py_0 rapidsai-nightly/label/cuda10.0
commonmark 0.9.1 py_0 conda-forge
cryptography 2.8 py36h72c5cf5_0 conda-forge
cuda100 1.0 0 pytorch
cudatoolkit 10.0.130 0 nvidia
cudf 0.9.0a0+1525.gd3f19ca pypi_0 pypi
cudnn 7.6.0 cuda10.0_0 nvidia
cugraph 0.9.0a1+40.ge038d75 pypi_0 pypi
cuml 0.9.0a1+219.gc31e78e.dirty pypi_0 pypi
cupy 6.3.0 py36h88562e5_0 rapidsai
curl 7.65.3 hf8cf82a_0 conda-forge
cycler 0.10.0 py_2 conda-forge
cython 0.29.14 py36he1b5a44_0 conda-forge
cytoolz 0.10.1 py36h516909a_0 conda-forge
dask 2.8.0+1.g7967652 pypi_0 pypi
dask-cudf 0.11.0a0+2889.g582be6d.dirty pypi_0 pypi
dask-cuml 0.9.0a0 pypi_0 pypi
dask-xgboost 0.1.5 pypi_0 pypi
dataclasses 0.7 py36_0 conda-forge
dbus 1.13.6 he372182_0 conda-forge
decorator 4.4.1 py_0 conda-forge
defusedxml 0.6.0 py_0 conda-forge
distributed 2.8.0+4.g029ed17 pypi_0 pypi
dlpack 0.2 he1b5a44_1 conda-forge
docutils 0.15.2 py36_0 conda-forge
double-conversion 3.1.5 he1b5a44_2 conda-forge
editdistance 0.5.3 py36he1b5a44_0 conda-forge
entrypoints 0.3 py36_1000 conda-forge
expat 2.2.5 he1b5a44_1004 conda-forge
faiss-gpu 1.5.0 py36_cuda10.0_1 [cuda100] pytorch
fastavro 0.22.7 py36h516909a_0 conda-forge
fastrlock 0.4 py36he1b5a44_1000 conda-forge
flake8 3.7.9 py36_0 conda-forge
flatbuffers 1.11.0 he1b5a44_0 conda-forge
fontconfig 2.13.1 h86ecdb6_1001 conda-forge
freetype 2.10.0 he983fc9_1 conda-forge
fsspec 0.6.0 py_0 conda-forge
future 0.18.2 py36_0 conda-forge
gettext 0.19.8.1 hc5be6a0_1002 conda-forge
gflags 2.2.2 he1b5a44_1002 conda-forge
glib 2.58.3 py36h6f030ca_1002 conda-forge
glog 0.4.0 he1b5a44_1 conda-forge
gmp 6.1.2 hf484d3e_1000 conda-forge
grpc-cpp 1.23.0 h18db393_0 conda-forge
gst-plugins-base 1.14.5 h0935bb2_0 conda-forge
gstreamer 1.14.5 h36ae1b5_0 conda-forge
heapdict 1.0.1 py_0 conda-forge
hypothesis 4.44.2 py36_0 conda-forge
icu 64.2 he1b5a44_1 conda-forge
identify 1.4.7 py_0 conda-forge
idna 2.8 py36_1000 conda-forge
imagesize 1.1.0 py_0 conda-forge
importlib_metadata 0.23 py36_0 conda-forge
importlib_resources 1.0.2 py36_1000 conda-forge
intel-openmp 2019.4 243
ipykernel 5.1.3 py36h5ca1d4c_0 conda-forge
ipython 7.9.0 py36h5ca1d4c_0 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge
ipywidgets 7.5.1 pypi_0 pypi
isort 4.3.21 py36_0 conda-forge
jedi 0.15.1 py36_0 conda-forge
jinja2 2.10.3 py_0 conda-forge
joblib 0.14.0 py_0 conda-forge
jpeg 9c h14c3975_1001 conda-forge
json5 0.8.5 py_0 conda-forge
jsonschema 3.2.0 py36_0 conda-forge
jupyter 1.0.0 pypi_0 pypi
jupyter-console 6.0.0 pypi_0 pypi
jupyter_client 5.3.3 py36_1 conda-forge
jupyter_core 4.6.1 py36_0 conda-forge
jupyterlab 1.0.2 py36_0 conda-forge
jupyterlab_server 1.0.6 py_0 conda-forge
kiwisolver 1.1.0 py36hc9558a2_0 conda-forge
krb5 1.16.3 h05b26f9_1001 conda-forge
libblas 3.8.0 14_mkl conda-forge
libcblas 3.8.0 14_mkl conda-forge
libclang 8.0.0 h6bb024c_0 rapidsai/label/cuda10.0
libcumlmg 0.0.0.dev0 cuda10.0_373 nvidia/label/cuda10.0
libcurl 7.65.3 hda55be3_0 conda-forge
libedit 3.1.20170329 hf8c457e_1001 conda-forge
libevent 2.1.10 h72c5cf5_0 conda-forge
libffi 3.2.1 he1b5a44_1006 conda-forge
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_2 conda-forge
libiconv 1.15 h516909a_1005 conda-forge
liblapack 3.8.0 14_mkl conda-forge
liblapacke 3.8.0 14_mkl conda-forge
libllvm8 8.0.1 hc9558a2_0 conda-forge
libopenblas 0.3.6 h6e990d7_4 conda-forge
libpng 1.6.37 hed695b0_0 conda-forge
libprotobuf 3.8.0 h8b12597_0 conda-forge
librmm 0.11.0b191118 cuda10.0_40 rapidsai-nightly
librmm-cffi 0.9.0 pypi_0 pypi
libsodium 1.0.17 h516909a_0 conda-forge
libssh2 1.8.2 h22169c7_2 conda-forge
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.1.0 hfc65ed5_0 conda-forge
libuuid 2.32.1 h14c3975_1000 conda-forge
libuv 1.33.1 h516909a_0 conda-forge
libxcb 1.13 h14c3975_1002 conda-forge
libxml2 2.9.10 hee79883_0 conda-forge
llvmlite 0.30.0 py36h8b12597_1 conda-forge
locket 0.2.0 py_2 conda-forge
lz4-c 1.8.3 he1b5a44_1001 conda-forge
markdown 3.0.1 pypi_0 pypi
markupsafe 1.1.1 py36h516909a_0 conda-forge
matplotlib 3.1.2 py36_1 conda-forge
matplotlib-base 3.1.2 py36h250f245_1 conda-forge
mccabe 0.6.1 py_1 conda-forge
mistune 0.8.4 py36h516909a_1000 conda-forge
mkl 2019.4 243
more-itertools 7.2.0 py_0 conda-forge
msgpack-python 0.6.2 py36hc9558a2_0 conda-forge
mypy_extensions 0.4.3 py36_0 conda-forge
nbconvert 5.6.1 py36_0 conda-forge
nbformat 4.4.0 py_1 conda-forge
nbsphinx 0.4.3 py_0 conda-forge
nccl 2.4.6.1 cuda10.0_0 nvidia
ncurses 6.1 hf484d3e_1002 conda-forge
networkx 2.3 py_0 conda-forge
nodeenv 1.3.3 py_0 conda-forge
notebook 6.0.1 py36_0 conda-forge
numba 0.46.0 py36hb3f55d8_1 conda-forge
numpy 1.16.2 py36h8b7e671_1 conda-forge
numpydoc 0.9.1 py_0 conda-forge
nvstrings-cuda100 0.0.0.dev0 pypi_0 pypi
olefile 0.46 py_0 conda-forge
openblas 0.3.6 h6e990d7_4 conda-forge
openssl 1.1.1d h516909a_0 conda-forge
packaging 19.2 py_0 conda-forge
pandas 0.24.2 py36hb3f55d8_0 conda-forge
pandoc 1.19.2 0 conda-forge
pandocfilters 1.4.2 py_1 conda-forge
parquet-cpp 1.5.1 2 conda-forge
parso 0.5.1 py_0 conda-forge
partd 1.0.0 py_0 conda-forge
pathspec 0.6.0 py_0 conda-forge
patsy 0.5.1 py_0 conda-forge
pcre 8.43 he1b5a44_0 conda-forge
pexpect 4.7.0 py36_0 conda-forge
pickleshare 0.7.5 py36_1000 conda-forge
pillow 6.2.1 py36h6b7be26_0 conda-forge
pip 19.3.1 py36_0 conda-forge
pluggy 0.12.0 py_0 conda-forge
pre_commit 1.18.1 py36_0 conda-forge
prometheus_client 0.7.1 py_0 conda-forge
prompt_toolkit 2.0.10 py_0 conda-forge
psutil 5.6.5 py36h516909a_0 conda-forge
pthread-stubs 0.4 h14c3975_1001 conda-forge
ptyprocess 0.6.0 py_1001 conda-forge
py 1.8.0 py_0 conda-forge
pyarrow 0.15.0 py36h8b68381_1 conda-forge
pycodestyle 2.5.0 py_0 conda-forge
pycparser 2.19 py36_1 conda-forge
pyflakes 2.1.1 py_0 conda-forge
pygments 2.4.2 py_0 conda-forge
pynvml 8.0.3 pypi_0 pypi
pyopenssl 19.0.0 py36_0 conda-forge
pyparsing 2.4.5 py_0 conda-forge
pyqt 5.9.2 py36hcca6a23_4 conda-forge
pyrsistent 0.15.5 py36h516909a_0 conda-forge
pysocks 1.7.1 py36_0 conda-forge
pytest 5.2.4 py36_0 conda-forge
python 3.6.7 h357f687_1006 conda-forge
python-dateutil 2.8.1 py_0 conda-forge
pytz 2019.3 py_0 conda-forge
pyyaml 5.1.2 py36h516909a_0 conda-forge
pyzmq 18.1.1 py36h1768529_0 conda-forge
qt 5.9.7 h0c104cb_3 conda-forge
qtconsole 4.6.0 pypi_0 pypi
rapidjson 1.1.0 he1b5a44_1002 conda-forge
re2 2019.11.01 he1b5a44_0 conda-forge
readline 8.0 hf8c457e_0 conda-forge
recommonmark 0.6.0 py_0 conda-forge
regex 2019.11.1 py36h516909a_0 conda-forge
requests 2.22.0 py36_1 conda-forge
rhash 1.3.6 h14c3975_1001 conda-forge
rmm 0.11.0b191118 py36_40 rapidsai-nightly
scikit-learn 0.21.3 py36hcdab131_0 conda-forge
scipy 1.3.0 py36h921218d_0 conda-forge
seaborn 0.9.0 py_0 conda-forge
send2trash 1.5.0 py_0 conda-forge
setuptools 41.6.0 py36_1 conda-forge
sip 4.19.8 py36hf484d3e_1000 conda-forge
six 1.13.0 py36_0 conda-forge
snappy 1.1.7 he1b5a44_1002 conda-forge
snowballstemmer 2.0.0 py_0 conda-forge
sortedcontainers 2.1.0 py_0 conda-forge
sphinx 2.2.1 py_0 conda-forge
sphinx-markdown-tables 0.0.10 pypi_0 pypi
sphinx_rtd_theme 0.4.3 py_0 conda-forge
sphinxcontrib-applehelp 1.0.1 py_0 conda-forge
sphinxcontrib-devhelp 1.0.1 py_0 conda-forge
sphinxcontrib-htmlhelp 1.0.2 py_0 conda-forge
sphinxcontrib-jsmath 1.0.1 py_0 conda-forge
sphinxcontrib-qthelp 1.0.2 py_0 conda-forge
sphinxcontrib-serializinghtml 1.1.3 py_0 conda-forge
sphinxcontrib-websupport 1.1.2 py_0 conda-forge
sqlite 3.30.1 hcee41ef_0 conda-forge
statsmodels 0.10.1 py36hc1659b7_2 conda-forge
streamz 0.5.2 pypi_0 pypi
tblib 1.4.0 py_0 conda-forge
terminado 0.8.3 py36_0 conda-forge
testpath 0.4.4 py_0 conda-forge
thrift-cpp 0.12.0 hf3afdfd_1004 conda-forge
tk 8.6.9 hed695b0_1003 conda-forge
toml 0.10.0 py_0 conda-forge
toolz 0.10.0 py_0 conda-forge
tornado 6.0.3 py36h516909a_0 conda-forge
traitlets 4.3.3 py36_0 conda-forge
typed-ast 1.4.0 py36h516909a_0 conda-forge
typing_extensions 3.7.4.1 py36_0 conda-forge
uriparser 0.9.3 he1b5a44_1 conda-forge
urllib3 1.25.7 py36_0 conda-forge
virtualenv 16.7.5 py_0 conda-forge
wcwidth 0.1.7 py_1 conda-forge
webencodings 0.5.1 py_1 conda-forge
wheel 0.33.6 py36_0 conda-forge
widgetsnbextension 3.5.1 pypi_0 pypi
xgboost 0.90.rapidsdev1 pypi_0 pypi
xorg-libxau 1.0.9 h14c3975_0 conda-forge
xorg-libxdmcp 1.1.3 h516909a_0 conda-forge
xz 5.2.4 h14c3975_1001 conda-forge
yaml 0.1.7 h14c3975_1001 conda-forge
zeromq 4.3.2 he1b5a44_2 conda-forge
zict 1.0.0 pypi_0 pypi
zipp 0.6.0 py_0 conda-forge
zlib 1.2.11 h516909a_1006 conda-forge
zstd 1.4.3 h3b9ef0a_0 conda-forge
I debugged this a bit more and the root cause seems to be cudf.DataFrame.__setitem__ being unable to set values if the key columns is a list, index, any mutable python object.
Minimal reproducer:
df = cudf.DataFrame({'a': [1.,3,4,1.],'b': [4.,5,6,-10], 'c': [6., 7., 5., 10.]})
df[df.columns] = 2 # error
df[['a','b','c']] = 2 # error
df[['a']] = 2 # error
df['a'] = 2 #Works fine
--> 440 elif name in self._cols:
441 self._cols[name] = self._prepare_series_for_add(col)
442 else:
TypeError: unhashable type: 'list'
One solution could be to check if key is a list/index/iterable to iterate through each element of the key and check if it exists in self._cols and add accordingly.
This is similar to https://github.com/rapidsai/cudf/issues/2758,
I have a pr in progress https://github.com/rapidsai/cudf/pull/3125 which slipped due to other priorities.
Will pick it up and drive it home asap.
cc @kkraus14 @shwina just for visibility
Great work on the triage. Seems like you've got a handle to build the fix, but let me know if I can jump in and help.
Can confirm that with the pr #3442 , below works:
import cudf
import dask_cudf
df = cudf.DataFrame({'a': [1.,3,4,1.],'b': [4.,5,6,-10], 'c': [6., 7., 5., 10.]})
ddf = dask_cudf.from_cudf(df, npartitions=2)
ddf.groupby(['a']).b.std().compute()
a
1.0 9.899494937
4.0 null
3.0 null
Name: b, dtype: float64
@Nanthini10 , Can you try this again ?
@Nanthini10 , Can you try this again ?
Can confirm this works with the https://github.com/rapidsai/cudf/pull/3442 merge. Thank you!
Most helpful comment
Can confirm that with the pr #3442 , below works: