Pandas: groupby breaks when using duplicated level names

Created on 16 May 2018 · 4Comments · Source: pandas-dev/pandas

Hi,
I updated to version 0.23.0 and all of a sudden the following code breaks:

import pandas as pd
df = pd.DataFrame(data={'date': list(pd.date_range('5.1.2018', '5.10.2018')),
                        'vals': list(range(10))})
df.groupby([df.date.dt.month, df.date.dt.day])['vals'].sum()

ValueError: Duplicated level name: "date", assigned to level 1, is already used for level 0.

Expected output:

Using version 0.22.0 the same code yields the following:

date  date
5     1       0
      2       1
      3       2
      4       3
      5       4
      6       5
      7       6
      8       7
      9       8
      10      9
Name: vals, dtype: int64

It obviously contains duplicated level names. I get why this might be a problem, but as of version 0.23.0 it's not possible to specify the resulting level names.

Groupby MultiIndex Regression

Source

guenteru

👍3

Most helpful comment

I suppose this is due to the change that MultiIndex level names now need to be unique: https://github.com/pandas-dev/pandas/issues/18872 and https://github.com/pandas-dev/pandas/pull/18882

This is a rather big break .. Temporary workaround for you can be to rename the series that is passed to groupby:

df.groupby([df.date.dt.month.rename('month'), df.date.dt.day.rename('day')])['vals'].sum()

jorisvandenbossche on 16 May 2018

👍8 ❤2

All 4 comments

Thanks.

cc @WillAyd @toobaz if you have ideas.

FYI @guenteru we have release candidates, if you want to try things out and report things before hand. They're announced on the low-traffic pandas-dev mailing list: https://mail.python.org/mailman/listinfo/pandas-dev

TomAugspurger on 16 May 2018

I suppose this is due to the change that MultiIndex level names now need to be unique: https://github.com/pandas-dev/pandas/issues/18872 and https://github.com/pandas-dev/pandas/pull/18882

This is a rather big break .. Temporary workaround for you can be to rename the series that is passed to groupby:

df.groupby([df.date.dt.month.rename('month'), df.date.dt.day.rename('day')])['vals'].sum()

jorisvandenbossche on 16 May 2018

👍8 ❤2

Duplicate of #19029 I think?

toobaz on 16 May 2018

Closing in favor of #19029

TomAugspurger on 17 May 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Suffixes ignored on second merge

MatzeB · 3Comments

Can't read csv using python pandas

Ashutosh-Srivastav · 3Comments

Hexbin plots does not display x label and xtick labels

BDannowitz · 3Comments

Pandas get_dummies() and n-1 Categorical Encoding Option to avoid Collinearity?

jaradc · 3Comments

BUG: fillna with inplace does not work with multiple columns selection by loc

hiiwave · 3Comments