Pandas: ENH/BUG: Rename of MultiIndex DataFrames does not work

Created on 8 Jul 2013  Â·  24Comments  Â·  Source: pandas-dev/pandas

xref #14139 for empty MI

Hi everybody,

in the current version renaming of MultiIndex DataFrames does not work. Lets take the following example:

import datetime as DT
import pandas as pd
df = pd.DataFrame({
'Branch' : 'A A A A A B'.split(),
'Buyer': 'Carl Mark Carl Joe Mark Carl'.split(),
'Quantity': [1,3,5,8,9,3],
'Date' : [
    DT.datetime(2013,9,1,13,0),
    DT.datetime(2013,9,1,13,5),
    DT.datetime(2013,10,1,20,0),
    DT.datetime(2013,10,3,10,0),
    DT.datetime(2013,12,2,12,0),                                      
    DT.datetime(2013,12,2,14,0),
    ]})

and the following query:

test_df = df[df['Buyer'].isin(['Carl', 'Mark'])].set_index('Buyer', append=True)[['Date']].unstack(['Buyer'])

Now, the following renaming does not work

test_df.rename(columns={('Date', 'Carl'): 'Carl'}, inplace=True)

Thanks in advance

Andy

API Design Bug MultiIndex

All 24 comments

what are you trying to accomplish, that doesn't make sense, renaming a 2-level to a single-level

You can select out the level however; is this what you are after?

In [8]: test_df['Date']
Out[8]: 
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

Hi everybody,

Sorry, the example ist not the best. Actually, I did some trend generation today and after updating to pandas current branch a lot of my code didn't work anymore because that rename function used to work as described above.
There are basically two scenarios where I used that hierarchical renaming:

  1. When generating stock trends (a short variable name is much easier to use than those tuples)
  2. When doing some complex operations as in the SO article, where the original variable (Date) does not matter any more but only that there is a matching date between Carl and Mark

I favour xs here, bit more explicit:

In [11]: test_df.xs('Date', axis=1)
Out[11]:
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

# or maybe
test_df.columns = test_df.columns.droplevel(0)
# or 
test_df.columns =  test_df.columns.get_level_values('Buyer')

The fact that replace was working before smells like a bug, as @jreback says, it doesn't really make any sense to rename like that...

renaming doesn't work for multiindexes period, whether it makes sense or not:

In [7]: df
Out[7]:
  Branch Buyer                Date  Quantity
0      A  Carl 2013-09-01 13:00:00         1
1      A  Mark 2013-09-01 13:05:00         3
2      A  Carl 2013-10-01 20:00:00         5
3      A   Joe 2013-10-03 10:00:00         8
4      A  Mark 2013-12-02 12:00:00         9
5      B  Carl 2013-12-02 14:00:00         3

In [8]: test_df = df[df['Buyer'].isin(['Carl', 'Mark'])].set_index('Buyer', append=True)[['Date']].unstack(['Buyer'])

In [9]: test_df
Out[9]:
                     Date
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

In [10]: test_df.rename(columns={('Date', 'Carl'): ('Care')})
Out[10]:
                     Date
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

In [11]: test_df.rename(columns={('Date', 'Carl'): ('Care', "sdf")})
Out[11]:
                     Date
Buyer                Carl                Mark
0     2013-09-01 13:00:00                 NaT
1                     NaT 2013-09-01 13:05:00
2     2013-10-01 20:00:00                 NaT
4                     NaT 2013-12-02 12:00:00
5     2013-12-02 14:00:00                 NaT

In [12]: test_df.rename(columns={('Date', 'Carl'): ('Care', "sdf")})

Now that _is_ a bug/feature request :)

Maybe you ought to be able to replace on each level for a MultiIndex, say using

test_df.rename(columns={'Buyer': {'Carl' : 'sdf'}})

not sure...

need to add level arg to rename maybe?

not sure level argument works/makes sense since rename allows you to change both index _and_ columns at the same time:s

Although perhaps my suggestion doesn't either (if want to replace same things as the level name/number)...

I think this is still an open issue. It would be great to be able to treat the column labels as tuples and just use rename in the "natural" (at least natural to me) way. For example:

df.rename(columns={c: (str(c[0]) + 'foo', str(c[1]) + 'bar') for c in df.columns})

@8one6 this is an open issue currently.

this still waiting for an API to deal with the multi-level API.

I am not sure I like the stringifying idea. But haven't thought too much about this.

Maybe an actual example would help

import numpy as np
import pandas as pd

rows = pd.Index(list('abcde'), names=['letter'])
columns = pd.MultiIndex.from_tuples([('px', c) for c in ['red', 'green', 'blue']], 
                                    names=['datum', 'color'])
df = pd.DataFrame(np.random.randn(len(rows), len(columns)), index=rows, columns=columns)

gives

datum        px                    
item        red     green      blue
a     -0.616822 -0.922983  0.148247
b     -0.383122 -0.451940  1.138330
c     -0.744860  2.299611  0.895295
d     -0.159886 -0.832159 -0.205430
e     -0.458384 -1.410207 -0.965780

So now I want to do this:

absdf = df.abs()
absdf.rename(columns={c: ('abspx', c[1]) for c in df.columns}, inplace=True)

but that doesn't do what I expect, it just gives back the unmodified frame. To accomplish what I want here, I would do:

newabsdf = df.abs()
newabsdf.columns = pd.MultiIndex.from_tuples([('abspx', c[1]) for c in df.columns], 
                                             names=df.columns.names)

which gives the desired result:

datum     abspx                    
item        red     green      blue
a      0.616822  0.922983  0.148247
b      0.383122  0.451940  1.138330
c      0.744860  2.299611  0.895295
d      0.159886  0.832159  0.205430
e      0.458384  1.410207  0.965780

Basically, in the multi-index context, I was expecting rename to "be happy" if the passed function/mapper/dictionary returned tuples with the correct number of elements. Am I doing something wrong above? Or would this be a new feature request? Or does this seem ambiguous in some way.

a multi-index renam at the moment does not work at all. The issue is how do you rename only part of a level

e.g.
red-> orange, how should I do this?
or
abspx -> foo

df.rename(columns={'red' : 'orange' }, level=1)
df.rename(columns={'abspx' : 'foo'},level=0)

but no way to do this (well it doesn't work), but does make sense

df.rename(columns={('abspx','red) : ('foo','orange')})

Ah, ok. So that last code block in your comment, is that a reasonable thing to hope will work at some point? I.e. is there a reason that would be a bad API for doing fully general multilevel renaming? (I think that's what I had tried to achieve with my dict comprehension in my absdf.rename... line above)

And the other two lines...

df.rename(columns={'red' : 'orange' }, level=1)
df.rename(columns={'abspx' : 'foo'}, level=0)

is that the current working proposal? Or already implemented? Or up for debate?

I think the prior dont work (but prob don't need much). The last is a proposed API.

I think their is a pull-requests somewhere which does most of this but wasn't finished IIRC.

+1 on this feature, I found very obscure renaming/replacing multi-index labels. What is the current state @jreback ?

@denfromufa its open.

In the meanwhile, could someone point to a recommended work-around? I'm dealing with some SurveyMonkey data that exports with unnecessary white space in the resulting MultiIndex and there doesn't seem to be an easy way to clean up the DataFrame.

@eronlloyd: Looks like passing level to df.rename() will soon work according to this pull request, but I think the accepted answer in the meantime is something similar to the code below, courtesy of unutbu on StackOverflow:

def map_level(df, dct, level=0):
    index = df.index
    index.set_levels([[dct.get(item, item) for item in names] if i==level else names
                      for i, names in enumerate(index.levels)], inplace=True)

When I need to do something similar, I just drop the index and replace the values using apply. Not sure if that's more expensive than unutbu's solution above though.

This was closed automatically by github, but that was not the intention (#15931 is not related to this)

hahah had a reference with the word fix in it!

Is there any news on this? This bug is a constant annoyance when trying to do complex aggregations. Since the 'agg' method doesn't have a way to assign new names to the aggregate columns the recommended method (AFAIK) is to first aggregate and then rename. But something like

df.groupby('A').agg({'B': ['mean', 'median'], 'C': ['min', 'max']}

returns a multi-index which I then cannot rename. I actually had to write my own wrapper function for aggregations as a workaround. (Happy to share the code if there's interest.)

Still open if you're interested in working on it.

On Wed, Feb 20, 2019 at 5:02 AM mwiebusch78 notifications@github.com
wrote:

Is there any news on this? This bug is a constant annoyance when trying to
do complex aggregations. Since the 'agg' method doesn't have a way to
assign new names to the aggregate columns the recommended method (AFAIK) is
to first aggregate and then rename. But something like

df.groupby('A').agg({'B': ['mean', 'median'], 'C': ['mean', 'median']}

returns a multi-index which I then cannot rename. I actually had to write
my own wrapper function for aggregations as a workaround. (Happy to share
the code if there's interest.)

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/4160#issuecomment-465527993,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIjgSL_9hodVQ7D7O9I4INHvpXtbRks5vPSsrgaJpZM4Ay_pV
.

Just sharing a workaround: renaming of tuples works for flattened indices.

df = pd.DataFrame([[1,2,3],[3,4,5],[5,6,7], [7,8,9]])
df.columns = pd.MultiIndex.from_tuples([('i','a'),('i','b'),('ii','a')])

# Alternative 1
df.columns = df.columns.to_flat_index()
df = df.rename(columns={('i','b'):('i','c')})
df.columns = pd.MultiIndex.from_tuples(df.columns)

# Alternative 2
i = df.columns.get_loc(('i','b'))
cols = df.columns.to_flat_index()
cols[i] = ('i','c')
df.columns = pd.MultiIndex.from_tuples(cols)
Was this page helpful?
0 / 5 - 0 ratings