Pandas: from_dict / to_dict should accept same orient parameter

Created on 26 Dec 2015  路  7Comments  路  Source: pandas-dev/pandas

Hello,

pd.DataFrame.to_dict accepts orient parameter in
['dict', 'list', 'series', 'split', 'records', 'index'] (default being 'dict')

http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.DataFrame.to_dict.html

from_dict only accepts orient parameter in ['columns', 'index'] (default being 'columns')
I think that from_dict should accept same parameter orient for API consistency

http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.DataFrame.from_dict.html

Same for Panel
http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.Panel.from_dict.html
to_dict: missing method for Panel

Same for Series
http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.Series.to_dict.html
from_dict: missing method for Series

Kind regards

Most helpful comment

what you are actually asking

I guess he is referring to the problem that to_dict has more modes than from_dict, which means that some to_dict conversions don't have an inverse operation.

It's an API inconsistency, and a consistent API is a "good thing". Here is an example of a problem that arises from this inconsistency:

When converting a DataFrame to JSON we are using the orient="split" option, because it seems to be only option that preserves a DataFrame entirely (order of columns and index). In the inverse operation we would need the equivalent. We cannot use to_json/read_json directly (which both support orient="split"), because the DataFrame is only a nested element in a bigger JSON structure, which requires operating on dicts. But Pandas lacks the inverse operation for dicts... We tried to temporarily convert the dict to string, allowing to use the read_json on the string with the orient split option, but the double conversion makes it prohibitively slow.

All 7 comments

you would have to show a specific example here of why this API change would actually be useful. Don't just list lots of code references.

Use case is to be able to save to MongoDB Series, DataFrame, Panel and also retrieve them.

An other feature to add to to_dict will be to output dict of NumPy arrays instead of dict of Python lists.

pls show a specific example
and a use case - you are describing a very general case

still not sure WHY this would be a good thing to add to the API nor what you are actually asking

what you are actually asking

I guess he is referring to the problem that to_dict has more modes than from_dict, which means that some to_dict conversions don't have an inverse operation.

It's an API inconsistency, and a consistent API is a "good thing". Here is an example of a problem that arises from this inconsistency:

When converting a DataFrame to JSON we are using the orient="split" option, because it seems to be only option that preserves a DataFrame entirely (order of columns and index). In the inverse operation we would need the equivalent. We cannot use to_json/read_json directly (which both support orient="split"), because the DataFrame is only a nested element in a bigger JSON structure, which requires operating on dicts. But Pandas lacks the inverse operation for dicts... We tried to temporarily convert the dict to string, allowing to use the read_json on the string with the orient split option, but the double conversion makes it prohibitively slow.

what you are actually asking

I guess he is referring to the problem that to_dict has more modes than from_dict, which means that some to_dict conversions don't have an inverse operation.

It's an API inconsistency, and a consistent API is a "good thing". Here is an example of a problem that arises from this inconsistency:

When converting a DataFrame to JSON we are using the orient="split" option, because it seems to be only option that preserves a DataFrame entirely (order of columns and index). In the inverse operation we would need the equivalent. We cannot use to_json/read_json directly (which both support orient="split"), because the DataFrame is only a nested element in a bigger JSON structure, which requires operating on dicts. But Pandas lacks the inverse operation for dicts... We tried to temporarily convert the dict to string, allowing to use the read_json on the string with the orient split option, but the double conversion makes it prohibitively slow.

I have the same problem with bluenote10, the double conversion is too slow for a frequently called api.Does anyone have a workaround?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nathanielatom picture nathanielatom  路  3Comments

matthiasroder picture matthiasroder  路  3Comments

andreas-thomik picture andreas-thomik  路  3Comments

BDannowitz picture BDannowitz  路  3Comments

marcelnem picture marcelnem  路  3Comments