xref #14139 for empty MI
Hi everybody,
in the current version renaming of MultiIndex DataFrames does not work. Lets take the following example:
import datetime as DT
import pandas as pd
df = pd.DataFrame({
'Branch' : 'A A A A A B'.split(),
'Buyer': 'Carl Mark Carl Joe Mark Carl'.split(),
'Quantity': [1,3,5,8,9,3],
'Date' : [
DT.datetime(2013,9,1,13,0),
DT.datetime(2013,9,1,13,5),
DT.datetime(2013,10,1,20,0),
DT.datetime(2013,10,3,10,0),
DT.datetime(2013,12,2,12,0),
DT.datetime(2013,12,2,14,0),
]})
and the following query:
test_df = df[df['Buyer'].isin(['Carl', 'Mark'])].set_index('Buyer', append=True)[['Date']].unstack(['Buyer'])
Now, the following renaming does not work
test_df.rename(columns={('Date', 'Carl'): 'Carl'}, inplace=True)
Thanks in advance
Andy
what are you trying to accomplish, that doesn't make sense, renaming a 2-level to a single-level
You can select out the level however; is this what you are after?
In [8]: test_df['Date']
Out[8]:
Buyer Carl Mark
0 2013-09-01 13:00:00 NaT
1 NaT 2013-09-01 13:05:00
2 2013-10-01 20:00:00 NaT
4 NaT 2013-12-02 12:00:00
5 2013-12-02 14:00:00 NaT
Hi everybody,
Sorry, the example ist not the best. Actually, I did some trend generation today and after updating to pandas current branch a lot of my code didn't work anymore because that rename function used to work as described above.
There are basically two scenarios where I used that hierarchical renaming:
I favour xs here, bit more explicit:
In [11]: test_df.xs('Date', axis=1)
Out[11]:
Buyer Carl Mark
0 2013-09-01 13:00:00 NaT
1 NaT 2013-09-01 13:05:00
2 2013-10-01 20:00:00 NaT
4 NaT 2013-12-02 12:00:00
5 2013-12-02 14:00:00 NaT
# or maybe
test_df.columns = test_df.columns.droplevel(0)
# or
test_df.columns = test_df.columns.get_level_values('Buyer')
The fact that replace was working before smells like a bug, as @jreback says, it doesn't really make any sense to rename like that...
renaming doesn't work for multiindexes period, whether it makes sense or not:
In [7]: df
Out[7]:
Branch Buyer Date Quantity
0 A Carl 2013-09-01 13:00:00 1
1 A Mark 2013-09-01 13:05:00 3
2 A Carl 2013-10-01 20:00:00 5
3 A Joe 2013-10-03 10:00:00 8
4 A Mark 2013-12-02 12:00:00 9
5 B Carl 2013-12-02 14:00:00 3
In [8]: test_df = df[df['Buyer'].isin(['Carl', 'Mark'])].set_index('Buyer', append=True)[['Date']].unstack(['Buyer'])
In [9]: test_df
Out[9]:
Date
Buyer Carl Mark
0 2013-09-01 13:00:00 NaT
1 NaT 2013-09-01 13:05:00
2 2013-10-01 20:00:00 NaT
4 NaT 2013-12-02 12:00:00
5 2013-12-02 14:00:00 NaT
In [10]: test_df.rename(columns={('Date', 'Carl'): ('Care')})
Out[10]:
Date
Buyer Carl Mark
0 2013-09-01 13:00:00 NaT
1 NaT 2013-09-01 13:05:00
2 2013-10-01 20:00:00 NaT
4 NaT 2013-12-02 12:00:00
5 2013-12-02 14:00:00 NaT
In [11]: test_df.rename(columns={('Date', 'Carl'): ('Care', "sdf")})
Out[11]:
Date
Buyer Carl Mark
0 2013-09-01 13:00:00 NaT
1 NaT 2013-09-01 13:05:00
2 2013-10-01 20:00:00 NaT
4 NaT 2013-12-02 12:00:00
5 2013-12-02 14:00:00 NaT
In [12]: test_df.rename(columns={('Date', 'Carl'): ('Care', "sdf")})
Now that _is_ a bug/feature request :)
Maybe you ought to be able to replace on each level for a MultiIndex, say using
test_df.rename(columns={'Buyer': {'Carl' : 'sdf'}})
not sure...
need to add level arg to rename maybe?
not sure level argument works/makes sense since rename allows you to change both index _and_ columns at the same time:s
Although perhaps my suggestion doesn't either (if want to replace same things as the level name/number)...
I think this is still an open issue. It would be great to be able to treat the column labels as tuples and just use rename in the "natural" (at least natural to me) way. For example:
df.rename(columns={c: (str(c[0]) + 'foo', str(c[1]) + 'bar') for c in df.columns})
@8one6 this is an open issue currently.
this still waiting for an API to deal with the multi-level API.
I am not sure I like the stringifying idea. But haven't thought too much about this.
Maybe an actual example would help
import numpy as np
import pandas as pd
rows = pd.Index(list('abcde'), names=['letter'])
columns = pd.MultiIndex.from_tuples([('px', c) for c in ['red', 'green', 'blue']],
names=['datum', 'color'])
df = pd.DataFrame(np.random.randn(len(rows), len(columns)), index=rows, columns=columns)
gives
datum px
item red green blue
a -0.616822 -0.922983 0.148247
b -0.383122 -0.451940 1.138330
c -0.744860 2.299611 0.895295
d -0.159886 -0.832159 -0.205430
e -0.458384 -1.410207 -0.965780
So now I want to do this:
absdf = df.abs()
absdf.rename(columns={c: ('abspx', c[1]) for c in df.columns}, inplace=True)
but that doesn't do what I expect, it just gives back the unmodified frame. To accomplish what I want here, I would do:
newabsdf = df.abs()
newabsdf.columns = pd.MultiIndex.from_tuples([('abspx', c[1]) for c in df.columns],
names=df.columns.names)
which gives the desired result:
datum abspx
item red green blue
a 0.616822 0.922983 0.148247
b 0.383122 0.451940 1.138330
c 0.744860 2.299611 0.895295
d 0.159886 0.832159 0.205430
e 0.458384 1.410207 0.965780
Basically, in the multi-index context, I was expecting rename to "be happy" if the passed function/mapper/dictionary returned tuples with the correct number of elements. Am I doing something wrong above? Or would this be a new feature request? Or does this seem ambiguous in some way.
a multi-index renam at the moment does not work at all. The issue is how do you rename only part of a level
e.g.
red-> orange, how should I do this?
or
abspx -> foo
df.rename(columns={'red' : 'orange' }, level=1)
df.rename(columns={'abspx' : 'foo'},level=0)
but no way to do this (well it doesn't work), but does make sense
df.rename(columns={('abspx','red) : ('foo','orange')})
Ah, ok. So that last code block in your comment, is that a reasonable thing to hope will work at some point? I.e. is there a reason that would be a bad API for doing fully general multilevel renaming? (I think that's what I had tried to achieve with my dict comprehension in my absdf.rename... line above)
And the other two lines...
df.rename(columns={'red' : 'orange' }, level=1)
df.rename(columns={'abspx' : 'foo'}, level=0)
is that the current working proposal? Or already implemented? Or up for debate?
I think the prior dont work (but prob don't need much). The last is a proposed API.
I think their is a pull-requests somewhere which does most of this but wasn't finished IIRC.
+1 on this feature, I found very obscure renaming/replacing multi-index labels. What is the current state @jreback ?
@denfromufa its open.
In the meanwhile, could someone point to a recommended work-around? I'm dealing with some SurveyMonkey data that exports with unnecessary white space in the resulting MultiIndex and there doesn't seem to be an easy way to clean up the DataFrame.
@eronlloyd: Looks like passing level to df.rename() will soon work according to this pull request, but I think the accepted answer in the meantime is something similar to the code below, courtesy of unutbu on StackOverflow:
def map_level(df, dct, level=0):
index = df.index
index.set_levels([[dct.get(item, item) for item in names] if i==level else names
for i, names in enumerate(index.levels)], inplace=True)
When I need to do something similar, I just drop the index and replace the values using apply. Not sure if that's more expensive than unutbu's solution above though.
This was closed automatically by github, but that was not the intention (#15931 is not related to this)
hahah had a reference with the word fix in it!
Is there any news on this? This bug is a constant annoyance when trying to do complex aggregations. Since the 'agg' method doesn't have a way to assign new names to the aggregate columns the recommended method (AFAIK) is to first aggregate and then rename. But something like
df.groupby('A').agg({'B': ['mean', 'median'], 'C': ['min', 'max']}
returns a multi-index which I then cannot rename. I actually had to write my own wrapper function for aggregations as a workaround. (Happy to share the code if there's interest.)
Still open if you're interested in working on it.
On Wed, Feb 20, 2019 at 5:02 AM mwiebusch78 notifications@github.com
wrote:
Is there any news on this? This bug is a constant annoyance when trying to
do complex aggregations. Since the 'agg' method doesn't have a way to
assign new names to the aggregate columns the recommended method (AFAIK) is
to first aggregate and then rename. But something likedf.groupby('A').agg({'B': ['mean', 'median'], 'C': ['mean', 'median']}
returns a multi-index which I then cannot rename. I actually had to write
my own wrapper function for aggregations as a workaround. (Happy to share
the code if there's interest.)—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/4160#issuecomment-465527993,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIjgSL_9hodVQ7D7O9I4INHvpXtbRks5vPSsrgaJpZM4Ay_pV
.
Just sharing a workaround: renaming of tuples works for flattened indices.
df = pd.DataFrame([[1,2,3],[3,4,5],[5,6,7], [7,8,9]])
df.columns = pd.MultiIndex.from_tuples([('i','a'),('i','b'),('ii','a')])
# Alternative 1
df.columns = df.columns.to_flat_index()
df = df.rename(columns={('i','b'):('i','c')})
df.columns = pd.MultiIndex.from_tuples(df.columns)
# Alternative 2
i = df.columns.get_loc(('i','b'))
cols = df.columns.to_flat_index()
cols[i] = ('i','c')
df.columns = pd.MultiIndex.from_tuples(cols)