Pandas: BUG: MultiIndex set_levels is not symmetrical with get_level_values

Created on 20 Sep 2016 · 6Comments · Source: pandas-dev/pandas

Code Sample

import pandas as pd
import io

text = """
A B C
1 1 1
1 2 2
2 1 3
2 2 4
"""
df = pd.read_csv(io.StringIO(text), delimiter = ' ')
# Note the output of get_level_values is the list of all values [1, 1, 2, 2]
print(df.index.get_level_values('A').tolist())
df.set_index(['A','B'], inplace = True)
df.index.set_levels(df.index.get_level_values('A').map(lambda x: x * 2), level='A', inplace=True)
print(df.index.get_level_values('A').tolist())
# outputs [2, 2, 2, 2]

Expected Output

[2, 2, 4, 4]

# If I re-order the input as:
text = """
A B C
1 1 1
2 1 2
1 2 3
2 2 4
"""
# it works as I expected giving [2, 4, 2, 4]

Is this a bug or expected behaviour?

output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: 0.7.5.None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

Bug Duplicate MultiIndex

Source

danio

Most helpful comment

@danio Yes, I agree that exposing set_levels and set_labels but not set_level_values (like what get_level_values does) makes it harder for users to see the functionality they want. I think that should be raised as a separate issue. I'll spend a bit of time to make sure exactly what functionality is needed and then file an issue.

bkandel-picwell on 22 Sep 2016

👍2

All 6 comments

duplicate of #13754 and PR in #14236

jreback on 21 Sep 2016

@jreback I don't think from the description of #13754 and the code changes in #14236 that it is an exact duplicate of those. I will try testing it with the suggested change and see if that changes the behaviour.

danio on 21 Sep 2016

@jreback I have confirmed it is not a duplicate. #13754 is about set_levels still changing the index even when verification fails. This issue is not about that.

I have realised that set_levels and get_level_values should not necessarily be symmetrical. What I think is missing from pandas is a get_levels function that would be symmetrical to get_levels (the levels property is not the same as you cannot use labels).

Failing that, I have found that the workaround for this is to replace

df.index.get_level_values(level)

with

df.index.levels[df.index.names.index(level)]

in my code, i.e.

level_index = df.index.names.index('A')
df.index.set_levels(df.index.levels[level_index].map(lambda x: x * 2), level='A', inplace=True)

danio on 21 Sep 2016

@danio I agree that this issue is not a duplicate of https://github.com/pydata/pandas/issues/13754, but I think that it's quite close to this description: https://github.com/pydata/pandas/issues/13741#issuecomment-248009216 Basically the issue is that there's no way to directly set the equivalent of the column names in a multiindex. I.e. where you would be used to saying df.columns = ['A', 'B', 'C'] or df.index = ['A', 'B', 'C'], you can't say df.set_columns(['A', 'B', 'C'], level=0).

bkandel on 21 Sep 2016

@bkandel, yes, the example there is quite convoluted but I think this is caused by the same underlying issue as #13741. I can't see any options to change the duplicate setting of this issue, probably I don't have the permissions?

It feels to me that MutilIndex.set_levels is exposing too much of the underlying representation of the index, and it should be replaced by a new function with an interface more like get_level_values. That's probably not a discussion for the issue tracker though.

danio on 22 Sep 2016