Pandas: groupby breaks when using duplicated level names

Created on 16 May 2018  路  4Comments  路  Source: pandas-dev/pandas

Hi,
I updated to version 0.23.0 and all of a sudden the following code breaks:

import pandas as pd
df = pd.DataFrame(data={'date': list(pd.date_range('5.1.2018', '5.10.2018')),
                        'vals': list(range(10))})
df.groupby([df.date.dt.month, df.date.dt.day])['vals'].sum()

ValueError: Duplicated level name: "date", assigned to level 1, is already used for level 0.

Expected output:

Using version 0.22.0 the same code yields the following:

date  date
5     1       0
      2       1
      3       2
      4       3
      5       4
      6       5
      7       6
      8       7
      9       8
      10      9
Name: vals, dtype: int64

It obviously contains duplicated level names. I get why this might be a problem, but as of version 0.23.0 it's not possible to specify the resulting level names.

Groupby MultiIndex Regression

Most helpful comment

I suppose this is due to the change that MultiIndex level names now need to be unique: https://github.com/pandas-dev/pandas/issues/18872 and https://github.com/pandas-dev/pandas/pull/18882

This is a rather big break .. Temporary workaround for you can be to rename the series that is passed to groupby:

df.groupby([df.date.dt.month.rename('month'), df.date.dt.day.rename('day')])['vals'].sum()

All 4 comments

Thanks.

cc @WillAyd @toobaz if you have ideas.

FYI @guenteru we have release candidates, if you want to try things out and report things before hand. They're announced on the low-traffic pandas-dev mailing list: https://mail.python.org/mailman/listinfo/pandas-dev

I suppose this is due to the change that MultiIndex level names now need to be unique: https://github.com/pandas-dev/pandas/issues/18872 and https://github.com/pandas-dev/pandas/pull/18882

This is a rather big break .. Temporary workaround for you can be to rename the series that is passed to groupby:

df.groupby([df.date.dt.month.rename('month'), df.date.dt.day.rename('day')])['vals'].sum()

Duplicate of #19029 I think?

Closing in favor of #19029

Was this page helpful?
0 / 5 - 0 ratings