Cudf: [BUG] NotImplementedError when using string index of Series

Created on 9 Apr 2020  路  6Comments  路  Source: rapidsai/cudf

Describe the bug

When I have a cudf.core.series.Series instance with a cudf.core.index.StringIndex as an index and try to use __getitem__ with a string input, I get a NotImplementedError.

The analogous behavior works in Pandas.

The reproduction steps will show a minimal pair.

Steps/Code to reproduce bug

>>> pdf = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
>>> pdf.loc[1]
a    2
b    5
Name: 1, dtype: int64
>>> pdf.loc[1]['a']
2
>>> cdf = cudf.DataFrame({'a':[1,2,3],'b':[4,5,6]})
>>> cdf.loc[1]
a    2
b    5
Name: 1, dtype: int64
>>> cdf.loc[1]['a'] # this errors but should return 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/pnguyen/miniconda3/envs/mgc/lib/python3.7/site-packages/cudf/core/series.py", line 455, in __getitem__
    data = self._column[arg]
  File "/home/pnguyen/miniconda3/envs/mgc/lib/python3.7/site-packages/cudf/core/column/column.py", line 483, in __getitem__
    raise NotImplementedError(type(arg))
NotImplementedError: <class 'cudf.core.column.string.StringColumn'>
>>> 

Expected behavior

I would have expected cdf.loc[1]['a'] to return 2 (similar to how Pandas behaves) rather than raising an exception.

Environment overview (please complete the following information)

  • Environment location: Bare-metal
  • Method of cuDF install: conda

Environment details

Click here to see environment details

 **git***
 commit 0fe705dd70f012a6e2d504c0d29f6f14bc990040 (HEAD -> master, origin/master, origin/HEAD)
 Author: Paul Nguyen <[email protected]>
 Date:   Fri Apr 3 13:31:37 2020 -0700

 Remove Auto* Concrete Types

 This commit removes AutoCuDiGraphType and AutoCuGraphType since it's been decided that it'll be a requirement for graphs to always be wrapped.

 This commit updates the tests to wrap the graphs used in the appropriate wrappers instead of relying on the existence of AutoCuDiGraphType and AutoCuGraphType.

 This commit updates the concrete_algorithm cugraph_pagerank to take a graph wrapped in CuGraph instead of a cugraph.DiGraph instance since we removed AutoCuDiGraphType and AutoCuGraphType.

 The analogous changes were made to cugraph_triangle_count, which was formerly known as auto_cugraph_triangle_count prior to this commit.

 This commit removes translate_graph_cugraph2autocugraph as well since AutoCuGraphType no longer exists.

 The same goes for translate_graph_cudfedge2cugraph.

 This patch also removes the helper functions _determine_dtype_from_cugraph_graph and _determine_weights_from_cugraph_graph from metagraph_cuda/types.py since those helpers were only used to reduce the redundant code shared between AutoCuDiGraphType, AutoCuGraphType, and CuGraph. Since CuGraph is the only user of those helpers, their internals have been inlined in the definition of CuGraph.
 **git submodules***

 ***OS Information***
 DGX_NAME="DGX Station"
 DGX_PRETTY_NAME="NVIDIA DGX Station"
 DGX_SWBUILD_DATE="2018-03-10"
 DGX_SWBUILD_VERSION="3.1.6"
 DGX_COMMIT_ID="057ab7036fdbf8688920b15b731b9045f772a626"
 DGX_SERIAL_NUMBER="0154117000010"

 DGX_OTA_VERSION="3.1.7"
 DGX_OTA_DATE="Mon Sep 10 13:59:38 PDT 2018"

 DGX_OTA_VERSION="3.1.8"
 DGX_OTA_DATE="Fri Sep  6 07:48:47 PDT 2019"

 DGX_OTA_VERSION="4.3.0"
 DGX_OTA_DATE="Fri Dec 20 14:31:33 PST 2019"

 DGX_R418_REPO_ENABLED=20191220-152734
 DISTRIB_ID=Ubuntu
 DISTRIB_RELEASE=18.04
 DISTRIB_CODENAME=bionic
 DISTRIB_DESCRIPTION="Ubuntu 18.04.3 LTS"
 NAME="Ubuntu"
 VERSION="18.04.3 LTS (Bionic Beaver)"
 ID=ubuntu
 ID_LIKE=debian
 PRETTY_NAME="Ubuntu 18.04.3 LTS"
 VERSION_ID="18.04"
 HOME_URL="https://www.ubuntu.com/"
 SUPPORT_URL="https://help.ubuntu.com/"
 BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
 PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
 VERSION_CODENAME=bionic
 UBUNTU_CODENAME=bionic
 Linux demouser-DGX-Station 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

 ***GPU Information***
 Thu Apr  9 08:11:15 2020
 +-----------------------------------------------------------------------------+
 | NVIDIA-SMI 418.116.00   Driver Version: 418.116.00   CUDA Version: 10.1     |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |===============================+======================+======================|
 |   0  Tesla V100-DGXS...  On   | 00000000:07:00.0 Off |                    0 |
 | N/A   43C    P0   150W / 300W |  13212MiB / 16125MiB |     85%      Default |
 +-------------------------------+----------------------+----------------------+
 |   1  Tesla V100-DGXS...  On   | 00000000:08:00.0 Off |                    0 |
 | N/A   39C    P0    36W / 300W |     11MiB / 16128MiB |      0%      Default |
 +-------------------------------+----------------------+----------------------+
 |   2  Tesla V100-DGXS...  On   | 00000000:0E:00.0 Off |                    0 |
 | N/A   39C    P0    37W / 300W |     11MiB / 16128MiB |      0%      Default |
 +-------------------------------+----------------------+----------------------+
 |   3  Tesla V100-DGXS...  On   | 00000000:0F:00.0 Off |                    0 |
 | N/A   39C    P0    37W / 300W |     11MiB / 16128MiB |      0%      Default |
 +-------------------------------+----------------------+----------------------+

 +-----------------------------------------------------------------------------+
 | Processes:                                                       GPU Memory |
 |  GPU       PID   Type   Process name                             Usage      |
 |=============================================================================|
 |    0     13436      C   python3                                     5497MiB |
 |    0     13564      C   python3                                     6983MiB |
 |    0     17904      C   python3                                      721MiB |
 +-----------------------------------------------------------------------------+

 ***CPU***
 Architecture:        x86_64
 CPU op-mode(s):      32-bit, 64-bit
 Byte Order:          Little Endian
 CPU(s):              40
 On-line CPU(s) list: 0-39
 Thread(s) per core:  2
 Core(s) per socket:  20
 Socket(s):           1
 NUMA node(s):        1
 Vendor ID:           GenuineIntel
 CPU family:          6
 Model:               79
 Model name:          Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
 Stepping:            1
 CPU MHz:             1599.178
 CPU max MHz:         3600.0000
 CPU min MHz:         1200.0000
 BogoMIPS:            4397.69
 Virtualization:      VT-x
 L1d cache:           32K
 L1i cache:           32K
 L2 cache:            256K
 L3 cache:            51200K
 NUMA node0 CPU(s):   0-39
 Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d

 ***CMake***
 /usr/bin/cmake
 cmake version 3.10.2

 CMake suite maintained and supported by Kitware (kitware.com/cmake).

 ***g++***
 /usr/bin/g++
 g++ (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
 Copyright (C) 2017 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


 ***nvcc***
 /usr/local/cuda/bin/nvcc
 nvcc: NVIDIA (R) Cuda compiler driver
 Copyright (c) 2005-2019 NVIDIA Corporation
 Built on Sun_Jul_28_19:07:16_PDT_2019
 Cuda compilation tools, release 10.1, V10.1.243

 ***Python***
 /home/pnguyen/miniconda3/envs/mgc/bin/python
 Python 3.7.7

 ***Environment Variables***
 PATH                            : /home/pnguyen/miniconda3/envs/mgc/bin:/home/pnguyen/.local/bin:/usr/local/cuda/bin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/home/pnguyen/scripts:/home/pnguyen/bin
 LD_LIBRARY_PATH                 : /usr/local/cuda/lib64:/usr/local/cuda/lib64
 NUMBAPRO_NVVM                   :
 NUMBAPRO_LIBDEVICE              :
 CONDA_PREFIX                    : /home/pnguyen/miniconda3/envs/mgc
 PYTHON_PATH                     :

 conda not found
 ***pip packages***
 /home/pnguyen/miniconda3/envs/mgc/bin/pip
 Package            Version                 Location
 ------------------ ----------------------- ---------------------------------
 appdirs            1.4.3
 attrs              19.3.0
 black              19.10b0
 certifi            2020.4.5.1
 cfgv               3.1.0
 click              7.1.1
 coverage           5.0
 cudf               0.13.0+0.ga2804c3.dirty
 cugraph            0.13.0+0.gac36e8c.dirty
 cupy               7.3.0
 Cython             0.29.16
 decorator          4.4.2
 editdistance       0.5.3
 fastavro           0.23.0
 fastrlock          0.4
 fsspec             0.6.3
 identify           1.4.13
 importlib-metadata 1.5.0
 llvmlite           0.31.0
 metagraph          0.0.1+12.g15c13c6
 metagraph-cuda     0.0.1+24.g0fe705d.dirty /home/pnguyen/code/metagraph-cuda
 mkl-fft            1.0.15
 mkl-random         1.1.0
 mkl-service        2.3.0
 more-itertools     8.2.0
 mypy-extensions    0.4.3
 networkx           2.4
 nodeenv            1.3.5
 numba              0.48.0
 numpy              1.18.1
 nvstrings-cuda100  0.13.0
 packaging          20.3
 pandas             0.25.3
 pathspec           0.7.0
 pip                20.0.2
 pluggy             0.13.1
 pre-commit         2.2.0
 py                 1.8.1
 pyarrow            0.15.0
 pyparsing          2.4.6
 pytest             5.4.1
 pytest-cov         2.8.1
 python-dateutil    2.8.1
 pytz               2019.3
 PyYAML             5.3.1
 regex              2020.2.20
 rmm                0.13.0
 scipy              1.4.1
 setuptools         46.1.3.post20200330
 six                1.14.0
 toml               0.10.0
 typed-ast          1.4.1
 typing-extensions  3.7.4.1
 virtualenv         16.7.5
 wcwidth            0.1.9
 wheel              0.34.2
 zipp               2.2.0

Additional context

N/A

bug cuDF (Python)

All 6 comments

@paul-tqh-nguyen try using cdf.loc[1].loc['a']

@rgsl888prabhu The discrepancy with Pandas is quite unintuitive, but that suggestion allows me to get the results I intend to get! Thank you for the quick response!

cdf.loc[1] returns a series and we currently support only positional indexing through __getitem__ and need to use loc to access with respect to actual indices of the series. So, this forces to use another loc as I had suggested.

cdf.loc[1] returns a series and we currently support only positional indexing through __getitem__ and need to use loc to access with respect to actual indices of the series. So, this forces to use another loc as I had suggested.

@rgsl888prabhu Lets treat this as a bug that __getitem__ should do index based gathering instead of row number based gathering.

@kkraus14 So, should we assume that this is always index based gathering rather than row number based, and if user wants row number based then user should use iloc.

Yes exactly, this is Pandas behavior that we should emulate.

Was this page helpful?
0 / 5 - 0 ratings