Pandas: Dataframe constructor fails when given dict with None value

Created on 9 Oct 2016 · 7Comments · Source: pandas-dev/pandas

A small, complete example of the issue

# Your code here

import pandas as pd
pd.Dataframe(dict(a=None), index= [0])

In [3]: pd.DataFrame(dict(a=None),index=[0])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-20b65f605ca3> in <module>()
----> 1 pd.DataFrame(dict(a=None),index=[0])

miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
    264                                  dtype=dtype, copy=copy)
    265         elif isinstance(data, dict):
--> 266             mgr = self._init_dict(data, index, columns, dtype=dtype)
    267         elif isinstance(data, ma.MaskedArray):
    268             import numpy.ma.mrecords as mrecords

miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/frame.pyc in _init_dict(self, data, index, columns, dtype)
    400             arrays = [data[k] for k in keys]
    401 
--> 402         return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    403 
    404     def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   5382 
   5383     # don't force copy because getting jammed in an ndarray anyway
-> 5384     arrays = _homogenize(arrays, index, dtype)
   5385 
   5386     # from BlockManager perspective

miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/frame.pyc in _homogenize(data, index, dtype)
   5693                 v = lib.fast_multiget(v, oindex.values, default=NA)
   5694             v = _sanitize_array(v, index, dtype=dtype, copy=False,
-> 5695                                 raise_cast_failure=False)
   5696 
   5697         homogenized.append(v)

miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/series.pyc in _sanitize_array(data, index, dtype, copy, raise_cast_failure)
   2917 
   2918     # scalar like
-> 2919     if subarr.ndim == 0:
   2920         if isinstance(data, list):  # pragma: no cover
   2921             subarr = np.array(data, dtype=object)

AttributeError: 'NoneType' object has no attribute 'ndim'

Expected Output

This previously worked with a sensible output in 0.18.1:

In [2]: pd.DataFrame(dict(a=None),index=[0])
Out[2]:
a
0 None

Output of `pd.show_versions()`

Working version:

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 3.2.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24
numpy: 1.11.2
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.5
lxml: 3.6.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

Broken version:

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 3.2.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.0
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24
numpy: 1.11.2
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.5
lxml: 3.6.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

Bug Missing-data Reshaping

Source

gitj

Most helpful comment

So this works correctly in the following cases.

In [12]: pd.DataFrame(columns=['a'], index=[0])
Out[12]: 
     a
0  NaN

In [13]: pd.DataFrame(dict(a=np.nan), index=[0])
Out[13]: 
    a
0 NaN

The behavior in 0.18.1 is actually wrong, this should coerce to the np.nan case, as dtype is not specified.

pull-requests to fix are welcome.

jreback on 9 Oct 2016

❤1 👍1

All 7 comments

So this works correctly in the following cases.

In [12]: pd.DataFrame(columns=['a'], index=[0])
Out[12]: 
     a
0  NaN

In [13]: pd.DataFrame(dict(a=np.nan), index=[0])
Out[13]: 
    a
0 NaN

The behavior in 0.18.1 is actually wrong, this should coerce to the np.nan case, as dtype is not specified.

pull-requests to fix are welcome.

jreback on 9 Oct 2016

❤1 👍1

Hey @brandonmburroughs, I saw that you're working on this too and beat me to the PR. No worries, I wasn't as far along. Just wanted to let you know that the same problem shows up with the Series constructor too, i.e. Series([None]) fails to coerce to NaN.

I looked at fixing it a little further down the stack in series.py, but didn't check with any tests yet. Feel free to see my commit above that referenced this.

shawnheide on 11 Oct 2016

I was going to work on a PR but looks like you guys are on top of it. Thanks!

gitj on 11 Oct 2016

@shawnheide I actually noticed this problem after I created my PR. I created an issue (#14393) about this and there is some discussion going on there as to how to handle this as the cases are different. Depending upon how they want to handle the API design, your fix may be better suited to handle all cases.

brandonmburroughs on 11 Oct 2016

@jreback Given your comment in https://github.com/pandas-dev/pandas/issues/14393#issuecomment-252896200, I would personally say that the above case should not coerce to NaN, but keep the None. Thoughts?
(in any case that is the conservative road for now, as that was the behaviour in 0.18.1)

But in that case, @brandonmburroughs, your PR should be updated.

jorisvandenbossche on 26 Oct 2016

yeah open to having it be pre-0.19.0 behavior (IOW, remain as object) is fine.

jreback on 26 Oct 2016

To illustrate, in pandas 0.18:

In [7]: pd.DataFrame(dict(a=[None]), index= [0])
Out[7]: 
      a
0  None

In [8]: pd.DataFrame(dict(a=None), index= [0])
Out[8]: 
      a
0  None

So for 0.19.1, I would choose to go back to 0.18.1 behaviour, so not coercing to NaN (keep as None).
We can discuss if we want to change for later releases.

jorisvandenbossche on 26 Oct 2016

Was this page helpful?

0 / 5 - 0 ratings