Describe the bug
When I have a cudf.core.series.Series instance with a cudf.core.index.StringIndex as an index and try to use __getitem__ with a string input, I get a NotImplementedError.
The analogous behavior works in Pandas.
The reproduction steps will show a minimal pair.
Steps/Code to reproduce bug
>>> pdf = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
>>> pdf.loc[1]
a 2
b 5
Name: 1, dtype: int64
>>> pdf.loc[1]['a']
2
>>> cdf = cudf.DataFrame({'a':[1,2,3],'b':[4,5,6]})
>>> cdf.loc[1]
a 2
b 5
Name: 1, dtype: int64
>>> cdf.loc[1]['a'] # this errors but should return 2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pnguyen/miniconda3/envs/mgc/lib/python3.7/site-packages/cudf/core/series.py", line 455, in __getitem__
data = self._column[arg]
File "/home/pnguyen/miniconda3/envs/mgc/lib/python3.7/site-packages/cudf/core/column/column.py", line 483, in __getitem__
raise NotImplementedError(type(arg))
NotImplementedError: <class 'cudf.core.column.string.StringColumn'>
>>>
Expected behavior
I would have expected cdf.loc[1]['a'] to return 2 (similar to how Pandas behaves) rather than raising an exception.
Environment overview (please complete the following information)
Environment details
Click here to see environment details
**git***
commit 0fe705dd70f012a6e2d504c0d29f6f14bc990040 (HEAD -> master, origin/master, origin/HEAD)
Author: Paul Nguyen <[email protected]>
Date: Fri Apr 3 13:31:37 2020 -0700
Remove Auto* Concrete Types
This commit removes AutoCuDiGraphType and AutoCuGraphType since it's been decided that it'll be a requirement for graphs to always be wrapped.
This commit updates the tests to wrap the graphs used in the appropriate wrappers instead of relying on the existence of AutoCuDiGraphType and AutoCuGraphType.
This commit updates the concrete_algorithm cugraph_pagerank to take a graph wrapped in CuGraph instead of a cugraph.DiGraph instance since we removed AutoCuDiGraphType and AutoCuGraphType.
The analogous changes were made to cugraph_triangle_count, which was formerly known as auto_cugraph_triangle_count prior to this commit.
This commit removes translate_graph_cugraph2autocugraph as well since AutoCuGraphType no longer exists.
The same goes for translate_graph_cudfedge2cugraph.
This patch also removes the helper functions _determine_dtype_from_cugraph_graph and _determine_weights_from_cugraph_graph from metagraph_cuda/types.py since those helpers were only used to reduce the redundant code shared between AutoCuDiGraphType, AutoCuGraphType, and CuGraph. Since CuGraph is the only user of those helpers, their internals have been inlined in the definition of CuGraph.
**git submodules***
***OS Information***
DGX_NAME="DGX Station"
DGX_PRETTY_NAME="NVIDIA DGX Station"
DGX_SWBUILD_DATE="2018-03-10"
DGX_SWBUILD_VERSION="3.1.6"
DGX_COMMIT_ID="057ab7036fdbf8688920b15b731b9045f772a626"
DGX_SERIAL_NUMBER="0154117000010"
DGX_OTA_VERSION="3.1.7"
DGX_OTA_DATE="Mon Sep 10 13:59:38 PDT 2018"
DGX_OTA_VERSION="3.1.8"
DGX_OTA_DATE="Fri Sep 6 07:48:47 PDT 2019"
DGX_OTA_VERSION="4.3.0"
DGX_OTA_DATE="Fri Dec 20 14:31:33 PST 2019"
DGX_R418_REPO_ENABLED=20191220-152734
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.3 LTS"
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
Linux demouser-DGX-Station 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
***GPU Information***
Thu Apr 9 08:11:15 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.116.00 Driver Version: 418.116.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-DGXS... On | 00000000:07:00.0 Off | 0 |
| N/A 43C P0 150W / 300W | 13212MiB / 16125MiB | 85% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-DGXS... On | 00000000:08:00.0 Off | 0 |
| N/A 39C P0 36W / 300W | 11MiB / 16128MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-DGXS... On | 00000000:0E:00.0 Off | 0 |
| N/A 39C P0 37W / 300W | 11MiB / 16128MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-DGXS... On | 00000000:0F:00.0 Off | 0 |
| N/A 39C P0 37W / 300W | 11MiB / 16128MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 13436 C python3 5497MiB |
| 0 13564 C python3 6983MiB |
| 0 17904 C python3 721MiB |
+-----------------------------------------------------------------------------+
***CPU***
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
Stepping: 1
CPU MHz: 1599.178
CPU max MHz: 3600.0000
CPU min MHz: 1200.0000
BogoMIPS: 4397.69
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 51200K
NUMA node0 CPU(s): 0-39
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d
***CMake***
/usr/bin/cmake
cmake version 3.10.2
CMake suite maintained and supported by Kitware (kitware.com/cmake).
***g++***
/usr/bin/g++
g++ (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
***nvcc***
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
***Python***
/home/pnguyen/miniconda3/envs/mgc/bin/python
Python 3.7.7
***Environment Variables***
PATH : /home/pnguyen/miniconda3/envs/mgc/bin:/home/pnguyen/.local/bin:/usr/local/cuda/bin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/home/pnguyen/scripts:/home/pnguyen/bin
LD_LIBRARY_PATH : /usr/local/cuda/lib64:/usr/local/cuda/lib64
NUMBAPRO_NVVM :
NUMBAPRO_LIBDEVICE :
CONDA_PREFIX : /home/pnguyen/miniconda3/envs/mgc
PYTHON_PATH :
conda not found
***pip packages***
/home/pnguyen/miniconda3/envs/mgc/bin/pip
Package Version Location
------------------ ----------------------- ---------------------------------
appdirs 1.4.3
attrs 19.3.0
black 19.10b0
certifi 2020.4.5.1
cfgv 3.1.0
click 7.1.1
coverage 5.0
cudf 0.13.0+0.ga2804c3.dirty
cugraph 0.13.0+0.gac36e8c.dirty
cupy 7.3.0
Cython 0.29.16
decorator 4.4.2
editdistance 0.5.3
fastavro 0.23.0
fastrlock 0.4
fsspec 0.6.3
identify 1.4.13
importlib-metadata 1.5.0
llvmlite 0.31.0
metagraph 0.0.1+12.g15c13c6
metagraph-cuda 0.0.1+24.g0fe705d.dirty /home/pnguyen/code/metagraph-cuda
mkl-fft 1.0.15
mkl-random 1.1.0
mkl-service 2.3.0
more-itertools 8.2.0
mypy-extensions 0.4.3
networkx 2.4
nodeenv 1.3.5
numba 0.48.0
numpy 1.18.1
nvstrings-cuda100 0.13.0
packaging 20.3
pandas 0.25.3
pathspec 0.7.0
pip 20.0.2
pluggy 0.13.1
pre-commit 2.2.0
py 1.8.1
pyarrow 0.15.0
pyparsing 2.4.6
pytest 5.4.1
pytest-cov 2.8.1
python-dateutil 2.8.1
pytz 2019.3
PyYAML 5.3.1
regex 2020.2.20
rmm 0.13.0
scipy 1.4.1
setuptools 46.1.3.post20200330
six 1.14.0
toml 0.10.0
typed-ast 1.4.1
typing-extensions 3.7.4.1
virtualenv 16.7.5
wcwidth 0.1.9
wheel 0.34.2
zipp 2.2.0
Additional context
N/A
@paul-tqh-nguyen try using cdf.loc[1].loc['a']
@rgsl888prabhu The discrepancy with Pandas is quite unintuitive, but that suggestion allows me to get the results I intend to get! Thank you for the quick response!
cdf.loc[1] returns a series and we currently support only positional indexing through __getitem__ and need to use loc to access with respect to actual indices of the series. So, this forces to use another loc as I had suggested.
cdf.loc[1]returns a series and we currently support only positional indexing through__getitem__and need to uselocto access with respect to actual indices of the series. So, this forces to use another loc as I had suggested.
@rgsl888prabhu Lets treat this as a bug that __getitem__ should do index based gathering instead of row number based gathering.
@kkraus14 So, should we assume that this is always index based gathering rather than row number based, and if user wants row number based then user should use iloc.
Yes exactly, this is Pandas behavior that we should emulate.