Pandas: Dataframe constructor fails when given dict with None value

Created on 9 Oct 2016  路  7Comments  路  Source: pandas-dev/pandas

A small, complete example of the issue

# Your code here

import pandas as pd
pd.Dataframe(dict(a=None), index= [0])
In [3]: pd.DataFrame(dict(a=None),index=[0])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-20b65f605ca3> in <module>()
----> 1 pd.DataFrame(dict(a=None),index=[0])

miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
    264                                  dtype=dtype, copy=copy)
    265         elif isinstance(data, dict):
--> 266             mgr = self._init_dict(data, index, columns, dtype=dtype)
    267         elif isinstance(data, ma.MaskedArray):
    268             import numpy.ma.mrecords as mrecords

miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/frame.pyc in _init_dict(self, data, index, columns, dtype)
    400             arrays = [data[k] for k in keys]
    401 
--> 402         return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    403 
    404     def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   5382 
   5383     # don't force copy because getting jammed in an ndarray anyway
-> 5384     arrays = _homogenize(arrays, index, dtype)
   5385 
   5386     # from BlockManager perspective

miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/frame.pyc in _homogenize(data, index, dtype)
   5693                 v = lib.fast_multiget(v, oindex.values, default=NA)
   5694             v = _sanitize_array(v, index, dtype=dtype, copy=False,
-> 5695                                 raise_cast_failure=False)
   5696 
   5697         homogenized.append(v)

miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/series.pyc in _sanitize_array(data, index, dtype, copy, raise_cast_failure)
   2917 
   2918     # scalar like
-> 2919     if subarr.ndim == 0:
   2920         if isinstance(data, list):  # pragma: no cover
   2921             subarr = np.array(data, dtype=object)

AttributeError: 'NoneType' object has no attribute 'ndim'

Expected Output

This previously worked with a sensible output in 0.18.1:

In [2]: pd.DataFrame(dict(a=None),index=[0])
Out[2]:
a
0 None

Output of pd.show_versions()


Working version:

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 3.2.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24
numpy: 1.11.2
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.5
lxml: 3.6.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

Broken version:

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 3.2.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.0
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24
numpy: 1.11.2
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.5
lxml: 3.6.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

Bug Missing-data Reshaping

Most helpful comment

So this works correctly in the following cases.

In [12]: pd.DataFrame(columns=['a'], index=[0])
Out[12]: 
     a
0  NaN

In [13]: pd.DataFrame(dict(a=np.nan), index=[0])
Out[13]: 
    a
0 NaN

The behavior in 0.18.1 is actually wrong, this should coerce to the np.nan case, as dtype is not specified.

pull-requests to fix are welcome.

All 7 comments

So this works correctly in the following cases.

In [12]: pd.DataFrame(columns=['a'], index=[0])
Out[12]: 
     a
0  NaN

In [13]: pd.DataFrame(dict(a=np.nan), index=[0])
Out[13]: 
    a
0 NaN

The behavior in 0.18.1 is actually wrong, this should coerce to the np.nan case, as dtype is not specified.

pull-requests to fix are welcome.

Hey @brandonmburroughs, I saw that you're working on this too and beat me to the PR. No worries, I wasn't as far along. Just wanted to let you know that the same problem shows up with the Series constructor too, i.e. Series([None]) fails to coerce to NaN.

I looked at fixing it a little further down the stack in series.py, but didn't check with any tests yet. Feel free to see my commit above that referenced this.

I was going to work on a PR but looks like you guys are on top of it. Thanks!

@shawnheide I actually noticed this problem after I created my PR. I created an issue (#14393) about this and there is some discussion going on there as to how to handle this as the cases are different. Depending upon how they want to handle the API design, your fix may be better suited to handle all cases.

@jreback Given your comment in https://github.com/pandas-dev/pandas/issues/14393#issuecomment-252896200, I would personally say that the above case should not coerce to NaN, but keep the None. Thoughts?
(in any case that is the conservative road for now, as that was the behaviour in 0.18.1)

But in that case, @brandonmburroughs, your PR should be updated.

yeah open to having it be pre-0.19.0 behavior (IOW, remain as object) is fine.

To illustrate, in pandas 0.18:

In [7]: pd.DataFrame(dict(a=[None]), index= [0])
Out[7]: 
      a
0  None

In [8]: pd.DataFrame(dict(a=None), index= [0])
Out[8]: 
      a
0  None

So for 0.19.1, I would choose to go back to 0.18.1 behaviour, so not coercing to NaN (keep as None).
We can discuss if we want to change for later releases.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Abrosimov-a-a picture Abrosimov-a-a  路  3Comments

matthiasroder picture matthiasroder  路  3Comments

nathanielatom picture nathanielatom  路  3Comments

marcelnem picture marcelnem  路  3Comments

ericdf picture ericdf  路  3Comments