Pandas: BUG: DataFrame from_dict constructor ignores Ordered dict when orient='index'

Created on 30 Sep 2014 · 8Comments · Source: pandas-dev/pandas

Hello,
I have been experimenting with OrderedDicts lately, and found a bug with the DataFrame from_dict constructor. Here is a sample code.

import collections
import pandas as pd

firstrow={}
firstrow['foo'] = 'bar'
firstrow['baz'] = 'buzz'

row1 = pd.Series(firstrow)

secondrow={}
secondrow['foo'] = 'bar2'
secondrow['baz'] = 'buzz2'

row2 = pd.Series(secondrow)

roworder = collections.OrderedDict()

roworder['zShould be first'] = row1
roworder['Should be second'] = row2

# Ordering is respected when sorting on columns
df = pd.DataFrame.from_dict(data=roworder, orient='columns')

# But not when sorting on rows
incorrectdf = pd.DataFrame.from_dict(data=roworder, orient='index')
correctdf = df.transpose()

INSTALLED VERSIONS

commit: None
python: 3.3.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: fr_CH

pandas: 0.14.1
nose: 1.3.4
Cython: 0.20.1
numpy: 1.9.0
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 2.2.0
sphinx: 1.2.3
patsy: 0.3.0
scikits.timeseries: None
dateutil: 2.2
pytz: 2013.9
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.0
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.5.7
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None

Bug Reshaping good first issue

Source

aimboden

👍1

Most helpful comment

@jreback is this still an issue in the current version of pandas? I'm seeing the problem on an older version (v0.16.2) and I'm not sure if it's been addressed in the current one.

df = pd.DataFrame.from_dict(ordered_dict_data, orient='index')

sorts the index alphabetically. I've been using the following hack to address it:

df = pd.DataFrame.from_dict(ordered_dict_data, orient='columns').T

My hack, however, sorts the columns alphabetically.

For the data that I have, it's easier for me to re-order these columns so the latter solution works better. To be precise, my data is an OrderedDict of OrderedDicts so I expect the sort order of both the index and columns to be respected. It looks something like this:

data = OrderedDict(
    'a': OrderedDict('aa': 5, 'bb': 10),
    'b': OrderedDict('aa': 7, 'bb': 14),
    ...)

If it's not fixed, I can take a stab at it.

alichaudry on 17 Feb 2017

👍2

All 8 comments

can you make your code runnable (so can simply copy/paste). you have some undefined variables.

jreback on 30 Sep 2014

Sorry about that! Should be fine now. If not, will check when back in the office tomorrow.

EDIT: the code now reproduces the above mentioned bug

aimboden on 30 Sep 2014

@Gimli510 that does look buggy.

welcome a pull-request to fix.

You can use your test example above, just step thru the code and see where its breaking and try a fix.

jreback on 1 Oct 2014

@jreback I think I found where the bug comes from.
The function _union_index calls
lib.fast_unique_multiple_list(indexes), which sorts the keys before returning them. Should we carry a flag telling this cython function not to sort the keys when the indexes list was created from an ordered dict? I guess there is a cleaner way to do this, but don't really have any idea about how to go about it.

# Up to this point, the future index is ordered as it should.
indexes = [['zShould be first', 'Should be second'], ['zShould be first', 'Should be second']]
# When indexes is a list with more than 1 items, we hit this path:        
# return Index(lib.fast_unique_multiple_list(indexes))

# However, 
lib.fast_unique_multiple_list(indexes)

returns

['Should be second', 'zShould be first']

aimboden on 2 Oct 2014

I think this should be handled in core/pandas/frame/extract_index. Need to differentiate between a dict and an OrderedDict.

maybe add in a have_ordered in addition to setting have_dict. Then you can pass this to _union_indexes(indexes,ordered=have_ordered)

Then you can validate that if ordered=True is passed (default is False)
then can do a unique preserving order (so pass the flag into fast_unique_multiple, iow don't sort)

jreback on 2 Oct 2014

@jreback
I have done based on what you said and in the last part how can I pass the flag to fast_unique_multiple because it calls fast_unique_multiple_list(_args, *_kwargs) and when I look at the lib.pyx it always sort the list at the end(uniques.sort())
any idea?

hamedhsn on 29 Sep 2015

@jreback is this still an issue in the current version of pandas? I'm seeing the problem on an older version (v0.16.2) and I'm not sure if it's been addressed in the current one.

df = pd.DataFrame.from_dict(ordered_dict_data, orient='index')

sorts the index alphabetically. I've been using the following hack to address it:

df = pd.DataFrame.from_dict(ordered_dict_data, orient='columns').T

My hack, however, sorts the columns alphabetically.

data = OrderedDict(
    'a': OrderedDict('aa': 5, 'bb': 10),
    'b': OrderedDict('aa': 7, 'bb': 14),
    ...)

If it's not fixed, I can take a stab at it.

alichaudry on 17 Feb 2017

👍2

Still an open issue.

TomAugspurger on 6 Jul 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Dataframe creation: Specifying dtypes with a dictionary

amelio-vazquez-reina · 3Comments

Can't read csv using python pandas

Ashutosh-Srivastav · 3Comments

Hexbin plots does not display x label and xtick labels

BDannowitz · 3Comments

ValueError plotting bar plot from DataFrame with existing Axes

swails · 3Comments

Cannot use apply on Series with Timestamp values

nathanielatom · 3Comments