Pandas: BUG: Shift on a group column when column name is a tuple-of-tuples results in NumPy VisibleDeprecationWarning

Created on 28 Jul 2020 · 5Comments · Source: pandas-dev/pandas

[+] I have checked that this issue has not already been reported.
[+] I have confirmed this bug exists on the latest version of pandas.
[+] (optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

import pandas as pd
import numpy as np

np.warnings.filterwarnings('error', category=np.VisibleDeprecationWarning) 

tuple_column =      ('A', ('B', 2),)
df =                pd.DataFrame({tuple_column: [1]}, index=['q'])
grp =               df.groupby(level=0)
df[tuple_column] =  grp[[tuple_column]].shift()

Problem description

Shifting a group that has a column name as tuple-of-tuples gives VisibleDeprecationWarning.

File "C:\Python\lib\site-packages\pandas\core\groupby\groupby.py", line 2562, in shift
return self._get_cythonized_result(
File "C:\Python\lib\site-packages\pandas\core\groupby\groupby.py", line 2457, in _get_cythonized_result
for idx, obj in enumerate(self._iterate_slices()):
File "C:\Python\lib\site-packages\pandas\core\groupby\generic.py", line 998, in _iterate_slices
obj = self._selected_obj
File "pandas_libs\properties.pyx", line 33, in pandas._libs.properties.CachedProperty.__get__
File "C:\Python\lib\site-packages\pandas\core\groupby\groupby.py", line 641, in _selected_obj
return self.obj[self._selection]
File "C:\Python\lib\site-packages\pandas\core\frame.py", line 2889, in __getitem__
if com.is_bool_indexer(key):
File "C:\Python\lib\site-packages\pandas\core\common.py", line 142, in is_bool_indexer
arr = np.asarray(key)
File "C:\Python\lib\site-packages\numpy\core_asarray.py", line 83, in asarray
return array(a, dtype, copy=False, order=order)

C:\Python\lib\site-packages\numpy\core_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray

Expected Output

No VisibleDeprecationWarning triggered.

Output of `pd.show_versions()`

commit : 6302f7b98ad24adda2d5a98fef3956f04f28039d
python : 3.8.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United Kingdom.1252

pandas : 1.1.0rc0+8.g6302f7b98
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : 4.9.1
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.0rc1+439.g7e9530338
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.50.1

Bug Compat Error Reporting

Source

misantroop

All 5 comments

Thanks @misantroop for the report. This is for NumPy 1.19 onwards?

could you provide a MRE that would be suitable as a test https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

simonjayhawkins on 28 Jul 2020

xref #31201

simonjayhawkins on 28 Jul 2020

Thanks @misantroop for the report. This is for NumPy 1.19 onwards?

could you provide a MRE that would be suitable as a test https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

Correct, issue does not appear in NumPy 1.18.5. I attempted to conform better to MRE guidelines.

misantroop on 28 Jul 2020

👍1

The issue can be reproduced with just indexing, so not specific to groupby or shift.

>>> import numpy as np
>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0rc0+7.g04e9e0afd'
>>>
>>> tup = "A", ("B", 2)
>>>
>>> ser = pd.Series([42], index=[tup])
>>> ser
(A, (B, 2))    42
dtype: int64
>>>
>>> ser[[tup]]
C:\Users\simon\Anaconda3\envs\pandas-dev\lib\site-packages\numpy\core\_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragge
d nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do
 this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order)
(A, (B, 2))    42
dtype: int64
>>>

simonjayhawkins on 28 Jul 2020

👍1

xref #24688

simonjayhawkins on 28 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings