Hi,
I updated to version 0.23.0 and all of a sudden the following code breaks:
import pandas as pd
df = pd.DataFrame(data={'date': list(pd.date_range('5.1.2018', '5.10.2018')),
'vals': list(range(10))})
df.groupby([df.date.dt.month, df.date.dt.day])['vals'].sum()
ValueError: Duplicated level name: "date", assigned to level 1, is already used for level 0.
Using version 0.22.0 the same code yields the following:
date date
5 1 0
2 1
3 2
4 3
5 4
6 5
7 6
8 7
9 8
10 9
Name: vals, dtype: int64
It obviously contains duplicated level names. I get why this might be a problem, but as of version 0.23.0 it's not possible to specify the resulting level names.
Thanks.
cc @WillAyd @toobaz if you have ideas.
FYI @guenteru we have release candidates, if you want to try things out and report things before hand. They're announced on the low-traffic pandas-dev mailing list: https://mail.python.org/mailman/listinfo/pandas-dev
I suppose this is due to the change that MultiIndex level names now need to be unique: https://github.com/pandas-dev/pandas/issues/18872 and https://github.com/pandas-dev/pandas/pull/18882
This is a rather big break .. Temporary workaround for you can be to rename the series that is passed to groupby:
df.groupby([df.date.dt.month.rename('month'), df.date.dt.day.rename('day')])['vals'].sum()
Duplicate of #19029 I think?
Closing in favor of #19029
Most helpful comment
I suppose this is due to the change that MultiIndex level names now need to be unique: https://github.com/pandas-dev/pandas/issues/18872 and https://github.com/pandas-dev/pandas/pull/18882
This is a rather big break .. Temporary workaround for you can be to rename the series that is passed to groupby: