# Your code here
import pandas as pd
pd.Dataframe(dict(a=None), index= [0])
In [3]: pd.DataFrame(dict(a=None),index=[0])
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-3-20b65f605ca3> in <module>()
----> 1 pd.DataFrame(dict(a=None),index=[0])
miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
264 dtype=dtype, copy=copy)
265 elif isinstance(data, dict):
--> 266 mgr = self._init_dict(data, index, columns, dtype=dtype)
267 elif isinstance(data, ma.MaskedArray):
268 import numpy.ma.mrecords as mrecords
miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/frame.pyc in _init_dict(self, data, index, columns, dtype)
400 arrays = [data[k] for k in keys]
401
--> 402 return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
403
404 def _init_ndarray(self, values, index, columns, dtype=None, copy=False):
miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
5382
5383 # don't force copy because getting jammed in an ndarray anyway
-> 5384 arrays = _homogenize(arrays, index, dtype)
5385
5386 # from BlockManager perspective
miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/frame.pyc in _homogenize(data, index, dtype)
5693 v = lib.fast_multiget(v, oindex.values, default=NA)
5694 v = _sanitize_array(v, index, dtype=dtype, copy=False,
-> 5695 raise_cast_failure=False)
5696
5697 homogenized.append(v)
miniconda2/envs/readout2/lib/python2.7/site-packages/pandas/core/series.pyc in _sanitize_array(data, index, dtype, copy, raise_cast_failure)
2917
2918 # scalar like
-> 2919 if subarr.ndim == 0:
2920 if isinstance(data, list): # pragma: no cover
2921 subarr = np.array(data, dtype=object)
AttributeError: 'NoneType' object has no attribute 'ndim'
This previously worked with a sensible output in 0.18.1:
In [2]: pd.DataFrame(dict(a=None),index=[0])
Out[2]:
a
0 None
pd.show_versions()
Working version:
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 3.2.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24
numpy: 1.11.2
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.5
lxml: 3.6.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None
Broken version:
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 3.2.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.19.0
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24
numpy: 1.11.2
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.5
lxml: 3.6.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None
So this works correctly in the following cases.
In [12]: pd.DataFrame(columns=['a'], index=[0])
Out[12]:
a
0 NaN
In [13]: pd.DataFrame(dict(a=np.nan), index=[0])
Out[13]:
a
0 NaN
The behavior in 0.18.1 is actually wrong, this should coerce to the np.nan case, as dtype is not specified.
pull-requests to fix are welcome.
Hey @brandonmburroughs, I saw that you're working on this too and beat me to the PR. No worries, I wasn't as far along. Just wanted to let you know that the same problem shows up with the Series constructor too, i.e. Series([None]) fails to coerce to NaN.
I looked at fixing it a little further down the stack in series.py, but didn't check with any tests yet. Feel free to see my commit above that referenced this.
I was going to work on a PR but looks like you guys are on top of it. Thanks!
@shawnheide I actually noticed this problem after I created my PR. I created an issue (#14393) about this and there is some discussion going on there as to how to handle this as the cases are different. Depending upon how they want to handle the API design, your fix may be better suited to handle all cases.
@jreback Given your comment in https://github.com/pandas-dev/pandas/issues/14393#issuecomment-252896200, I would personally say that the above case should not coerce to NaN, but keep the None. Thoughts?
(in any case that is the conservative road for now, as that was the behaviour in 0.18.1)
But in that case, @brandonmburroughs, your PR should be updated.
yeah open to having it be pre-0.19.0 behavior (IOW, remain as object) is fine.
To illustrate, in pandas 0.18:
In [7]: pd.DataFrame(dict(a=[None]), index= [0])
Out[7]:
a
0 None
In [8]: pd.DataFrame(dict(a=None), index= [0])
Out[8]:
a
0 None
So for 0.19.1, I would choose to go back to 0.18.1 behaviour, so not coercing to NaN (keep as None).
We can discuss if we want to change for later releases.
Most helpful comment
So this works correctly in the following cases.
The behavior in 0.18.1 is actually wrong, this should coerce to the
np.nancase, as dtype is not specified.pull-requests to fix are welcome.