Pandas: BUG: Regression creating DataFrame from nested dict

Created on 7 Aug 2018  路  8Comments  路  Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

# Your code here
pop = {'Nevada': {2001: 2.4, 2002: 2.9},
       'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
pd.DataFrame(pop, index=[2001, 2002, 2003])

Problem description

Raises exception:

In [6]: pd.DataFrame(pop, index=[2001, 2002, 2003])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-84df27ae30f4> in <module>()
----> 1 pd.DataFrame(pop, index=[2001, 2002, 2003])

~/miniconda/envs/arrow-dev/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    346                                  dtype=dtype, copy=copy)
    347         elif isinstance(data, dict):
--> 348             mgr = self._init_dict(data, index, columns, dtype=dtype)
    349         elif isinstance(data, ma.MaskedArray):
    350             import numpy.ma.mrecords as mrecords

~/miniconda/envs/arrow-dev/lib/python3.6/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
    457             arrays = [data[k] for k in keys]
    458 
--> 459         return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    460 
    461     def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

~/miniconda/envs/arrow-dev/lib/python3.6/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   7357 
   7358     # don't force copy because getting jammed in an ndarray anyway
-> 7359     arrays = _homogenize(arrays, index, dtype)
   7360 
   7361     # from BlockManager perspective

~/miniconda/envs/arrow-dev/lib/python3.6/site-packages/pandas/core/frame.py in _homogenize(data, index, dtype)
   7659             if isinstance(v, dict):
   7660                 if oindex is None:
-> 7661                     oindex = index.astype('O')
   7662 
   7663                 if isinstance(index, (DatetimeIndex, TimedeltaIndex)):

AttributeError: 'list' object has no attribute 'astype'

This code has worked for about 10 years; is this a deliberate change?

Bug Regression

Most helpful comment

Pretty sure it's this change: https://github.com/pandas-dev/pandas/commit/4efb39f01f5880122fa38d91e12d217ef70fad9e#diff-1e79abbbdd150d4771b91ea60a4e1cc7L7231

All 8 comments

This change happened between 0.22.0 and 0.23.0

maybe side effect of #19884? cc @topper-123

I just bisected this. The first bad commit is 4efb39f01f5880122fa38d91e12d217ef70fad9e

Pretty sure it's this change: https://github.com/pandas-dev/pandas/commit/4efb39f01f5880122fa38d91e12d217ef70fad9e#diff-1e79abbbdd150d4771b91ea60a4e1cc7L7231

cc @toobaz

While I'm surprised that there were no unit tests to catch this regression, it only occurred in the odd (?) case where the index passed is not a NumPy array

FYI, the reason I ran into this was that I'm updating _Python for Data Analysis_ to fix errata and the book build broke. Sorry for not reporting the issue sooner

Thanks @wesm for noticing this, I had completely overlooked one code branch.

Was this page helpful?
0 / 5 - 0 ratings