Pandas: BUG: DataFrame from_dict constructor ignores Ordered dict when orient='index'

Created on 30 Sep 2014  路  8Comments  路  Source: pandas-dev/pandas

Hello,
I have been experimenting with OrderedDicts lately, and found a bug with the DataFrame from_dict constructor. Here is a sample code.

import collections
import pandas as pd

firstrow={}
firstrow['foo'] = 'bar'
firstrow['baz'] = 'buzz'

row1 = pd.Series(firstrow)

secondrow={}
secondrow['foo'] = 'bar2'
secondrow['baz'] = 'buzz2'

row2 = pd.Series(secondrow)

roworder = collections.OrderedDict()

roworder['zShould be first'] = row1
roworder['Should be second'] = row2

# Ordering is respected when sorting on columns
df = pd.DataFrame.from_dict(data=roworder, orient='columns')

# But not when sorting on rows
incorrectdf = pd.DataFrame.from_dict(data=roworder, orient='index')
correctdf = df.transpose()

INSTALLED VERSIONS

commit: None
python: 3.3.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: fr_CH

pandas: 0.14.1
nose: 1.3.4
Cython: 0.20.1
numpy: 1.9.0
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 2.2.0
sphinx: 1.2.3
patsy: 0.3.0
scikits.timeseries: None
dateutil: 2.2
pytz: 2013.9
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.0
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.5.7
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None

Bug Reshaping good first issue

Most helpful comment

@jreback is this still an issue in the current version of pandas? I'm seeing the problem on an older version (v0.16.2) and I'm not sure if it's been addressed in the current one.

df = pd.DataFrame.from_dict(ordered_dict_data, orient='index') 

sorts the index alphabetically. I've been using the following hack to address it:

df = pd.DataFrame.from_dict(ordered_dict_data, orient='columns').T

My hack, however, sorts the columns alphabetically.

For the data that I have, it's easier for me to re-order these columns so the latter solution works better. To be precise, my data is an OrderedDict of OrderedDicts so I expect the sort order of both the index and columns to be respected. It looks something like this:

data = OrderedDict(
    'a': OrderedDict('aa': 5, 'bb': 10),
    'b': OrderedDict('aa': 7, 'bb': 14),
    ...)

If it's not fixed, I can take a stab at it.

All 8 comments

can you make your code runnable (so can simply copy/paste). you have some undefined variables.

Sorry about that! Should be fine now. If not, will check when back in the office tomorrow.

EDIT: the code now reproduces the above mentioned bug

@Gimli510 that does look buggy.

welcome a pull-request to fix.

You can use your test example above, just step thru the code and see where its breaking and try a fix.

@jreback I think I found where the bug comes from.
The function _union_index calls
lib.fast_unique_multiple_list(indexes), which sorts the keys before returning them. Should we carry a flag telling this cython function not to sort the keys when the indexes list was created from an ordered dict? I guess there is a cleaner way to do this, but don't really have any idea about how to go about it.

# Up to this point, the future index is ordered as it should.
indexes = [['zShould be first', 'Should be second'], ['zShould be first', 'Should be second']]
# When indexes is a list with more than 1 items, we hit this path:        
# return Index(lib.fast_unique_multiple_list(indexes))

# However, 
lib.fast_unique_multiple_list(indexes)

returns

['Should be second', 'zShould be first']

I think this should be handled in core/pandas/frame/extract_index. Need to differentiate between a dict and an OrderedDict.

maybe add in a have_ordered in addition to setting have_dict. Then you can pass this to _union_indexes(indexes,ordered=have_ordered)

Then you can validate that if ordered=True is passed (default is False)
then can do a unique preserving order (so pass the flag into fast_unique_multiple, iow don't sort)

@jreback
I have done based on what you said and in the last part how can I pass the flag to fast_unique_multiple because it calls fast_unique_multiple_list(_args, *_kwargs) and when I look at the lib.pyx it always sort the list at the end(uniques.sort())
any idea?

@jreback is this still an issue in the current version of pandas? I'm seeing the problem on an older version (v0.16.2) and I'm not sure if it's been addressed in the current one.

df = pd.DataFrame.from_dict(ordered_dict_data, orient='index') 

sorts the index alphabetically. I've been using the following hack to address it:

df = pd.DataFrame.from_dict(ordered_dict_data, orient='columns').T

My hack, however, sorts the columns alphabetically.

For the data that I have, it's easier for me to re-order these columns so the latter solution works better. To be precise, my data is an OrderedDict of OrderedDicts so I expect the sort order of both the index and columns to be respected. It looks something like this:

data = OrderedDict(
    'a': OrderedDict('aa': 5, 'bb': 10),
    'b': OrderedDict('aa': 7, 'bb': 14),
    ...)

If it's not fixed, I can take a stab at it.

Still an open issue.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

amelio-vazquez-reina picture amelio-vazquez-reina  路  3Comments

Ashutosh-Srivastav picture Ashutosh-Srivastav  路  3Comments

BDannowitz picture BDannowitz  路  3Comments

swails picture swails  路  3Comments

nathanielatom picture nathanielatom  路  3Comments