I'm am trying to call the to_dict function on the following DataFrame:
import pandas as pd
data = {"a": [1,2,3,4,5], "b": [90,80,40,60,30]}
df = pd.DataFrame(data)
a b
0 1 90
1 2 80
2 3 40
3 4 60
4 5 30
df.reset_index().to_dict("r")
[{'a': 1, 'b': 90, 'index': 0},
{'a': 2, 'b': 80, 'index': 1},
{'a': 3, 'b': 40, 'index': 2},
{'a': 4, 'b': 60, 'index': 3},
{'a': 5, 'b': 30, 'index': 4}]
However my problem occurs if I perform a float operation on the dataframe, which mutates the index into a float:
(df*1.0).reset_index().to_dict("r")
[{'a': 1.0, 'b': 90.0, 'index': 0.0},
{'a': 2.0, 'b': 80.0, 'index': 1.0},
{'a': 3.0, 'b': 40.0, 'index': 2.0},
{'a': 4.0, 'b': 60.0, 'index': 3.0},
{'a': 5.0, 'b': 30.0, 'index': 4.0}]
Can anyone explain the above behaviour or recommend a workaround, or verify whether or not this could be a pandas bug? None of the other outtypes in the to_dict method mutates the index as shown above.
I've replicated this on both pandas 0.14 and 0.18 (latest)
Many thanks!
link to stackoverflow: http://stackoverflow.com/questions/36548151/pandas-to-dict-changes-index-type-with-outtype-records
Nothing to do with the index, just the fact that you have any float dtypes in the data
data = {"a": [1.0,2,3,4,5], "b": [90,80,40,60,30]}
In [19]: df.to_dict("records")
Out[19]:
[{'a': 1.0, 'b': 90.0},
{'a': 2.0, 'b': 80.0},
{'a': 3.0, 'b': 40.0},
{'a': 4.0, 'b': 60.0},
{'a': 5.0, 'b': 30.0}]
If you look at the code, we use DataFrame.values, which returns a NumPy array, which must have a single dtype (float64 in this case).
We probably don't need to use .values here.
Thanks for your response. It there a possible workaround that I can use in the meantime?
Something like
In [28]: [x._asdict() for x in df.itertuples()]
Out[28]:
[OrderedDict([('Index', 0), ('a', 1.0), ('b', 90)]),
OrderedDict([('Index', 1), ('a', 2.0), ('b', 80)]),
OrderedDict([('Index', 2), ('a', 3.0), ('b', 40)]),
OrderedDict([('Index', 3), ('a', 4.0), ('b', 60)]),
OrderedDict([('Index', 4), ('a', 5.0), ('b', 30)])]
That's an OrderedDict using namedtuple._asdict, you can write dict comprehension if you want a regular one.
Thanks :)
Though one could argue that this result is correct as we don't support mixed types in int-float when doing a cross-section, IOW:
In [10]: (df*1.0).reset_index().iloc[1]
Out[10]:
index 1.0
a 2.0
b 80.0
Name: 1, dtype: float64
this is somewhat related to #12532, meaning that we should be iterating directly over (which already does the proper coercion), rather that doing a specific coercion in .to_dict().
giong to mark this an an API issue that needs discussion. This would actually be a fairly large change to correctly change this (though to be honest I think the current behavior is fine).
Note for future seekers - I'm trying to combine multiple pandas objects into one nested json structure.
Since to_json doesn't work in this case (manipulating json strings is hard), you might try to do to_dict(orient="records"), and combine the results of the to_dict()s into a bigger object, and do json.dumps on that. But because of this bug, you can't do that without screwing with the types of everything.
So then you might try doing @TomAugspurger's solution but you might find that for some reason it won't convert numpy types to python types, like to_dict() does, which makes json.dumps() fail.
My workaround solution is to do to_json() which gives you a correct json string with correct types, then do json.loads() on that to get python objects corresponding to that string, which you then put together whichever way you want (e.g. big_obj = {"a": df_a_json, "b": df_b_json}) and then run json.dumps on the whole thing. It's roundabout but it's the closest general solution I found without having to muck about with type conversions myself!
def to_records(df):
"""Replacement for pandas' to_dict(orient="records") which has trouble with
upcasting ints to floats in the case of other floats being there.
https://github.com/pandas-dev/pandas/issues/12859
"""
import json
return json.loads(df.to_json(orient="records"))
馃憤 There is an issue somewhere .to_dict using python types.