df = pd.DataFrame([
['DIM', 'A', 1, 2, 3, 4],
['DIM', 'B', 11, 22, 33, 44],
])
df.columns = ["dim1", "dim2", 'c', 'b', 'a', 'd']
df = df.set_index(["dim1", "dim2"])
df.columns.name = 'metrics'
print(df)
metrics c b a d
dim1 dim2
DIM A 1 2 3 4
B 11 22 33 44
print(df.unstack('dim2').stack('metrics'))
dim2 A B
dim1 metrics
DIM a 1 11
b 2 22
c 3 33
d 4 44
After stacking data by one level of MultiIndex columns, the output is unexpected. 'metrics' labels are sorted alphabetically, but data is show as is - without being matched with labels.
Either option shown below would be fine and expected:
dim2 A B
dim1 metrics
DIM a 3 33
b 2 22
c 1 11
d 4 44
dim2 A B
dim1 metrics
DIM c 1 11
b 2 22
a 3 33
d 4 44
pd.show_versions()commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.21.0
pytest: 3.2.3
pip: 9.0.1
setuptools: 38.2.3
Cython: None
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0b10
sqlalchemy: None
pymysql: 0.7.11.None
psycopg2: 2.7.3.2 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
I just wanted to add that there is a workaround for this issue, but it looks rather ugly:
df = df.stack('metrics').to_frame().unstack('dim2')
df.columns = df.columns.droplevel()
print(df)
dim2 A B
dim1 metrics
DIM c 1 11
b 2 22
a 3 33
d 4 44
As you can see it produces one of the expected outputs mentioned above.
Hmm interesting. Strangely enough if you change the order in which you apply the methods it gets you to the right answer
In [7]: df.stack('metrics').unstack('dim2')
Out[7]:
dim2 A B
dim1 metrics
DIM c 1 11
b 2 22
a 3 33
d 4 44
This is indeed a little weird! PR to patch is welcome!
I believe the problem stems from the block below:
This appears related to #8809 and essentially is causing a disconnect as it sorts the data but the subsequent constructor does not use the sorted labels for the level that gets pushed to the row index.
Not terribly familiar with this code so investigating further but lmk if anyone has insights there