Pandas: multiindex column in to_excel

Created on 15 Jan 2013  路  18Comments  路  Source: pandas-dev/pandas

link to #1651

Saving a multiindex column to_excel saves a sparse index.

Migrated from this StackOverflow question, with a smaller DataFrame.

In [1]: m = MultiIndex.from_tuples([(1,1),(1,2)], names=['a','b'])

In [2]: df = DataFrame([[1,2],[3,4]], columns=m)

In [3]: df
Out[3]: 
a  1   
b  1  2
0  1  2
1  3  4

In [4]: df.to_excel('test.xls')

Saves the xls:

    1.1 .2
0   1   2
1   3   4

This differs from how to_csv (which is not sparse):

,"(1L, 1L)","(1L, 2L)"
0,1,2
1,3,4
Enhancement IO Data IO Excel Output-Formatting

Most helpful comment

@jreback Can this be reopened? I also get this problem.

I've commented this on #6618, which seems related. From what I can see, the problem is that pandas is reserving the first column of the header rows to the MultiIndex, and creates a new line for the row index name even if it is unnamed:

>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=pd.MultiIndex.from_tuples([('A'
,''),('B','C'),('B','D')]))
>>> df.to_excel("out.xlsx", index_label="Foo")

pandas_to_excel_bug2

This bug was fixed some time ago for simple columns, but it was probably not a very good solution, since it is still buggy for MultiIndex columns:

>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['A','B','C'])
>>> df.to_excel("out.xlsx", index_label="Foo")

pandas_to_excel_bug3

Version information:

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.1.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

All 18 comments

Both to_excel and to_csv should allow a multiindex on the columns to print on multiple rows

I guess then read_csv would have to take a list for the header argument to reconstruct the mi

2478, this is already hiding in the codebase, just needs some TLC.

@hayd @jtratner IIRC this should still be open, right?

yes, definitely.

I think that has been fixed by #5423:

import pandas as pd

m = pd.MultiIndex.from_tuples([(1,1),(1,2)], names=['a','b'])
df = pd.DataFrame([[1,2],[3,4]], columns=m)

df.to_excel('test.xls')

Output:

screenshot

Can we close this?

Hi all,

I'm trying to write a multiindex dataframe to Excel using the example above, and not getting the same results.

Code:

import pandas as pd

m = pd.MultiIndex.from_tuples([(1,1),(1,2)], names=['a','b'])
df = pd.DataFrame([[1,2],[3,4]], columns=m)

df.to_excel('test.xls')

Output:
capture

There's an extra line being added-- different than the output generated above.

I'm using pandas version 0.19.2 on Ubuntu 14.04 and have also experienced this on Windows 10. I've tried this using pandas version 0.15.0 and it works properly, replicating the output above:

image

Is this a bug? And if so, can we re-open this issue?

Thanks,

Charlie

@mappingvermont I'm getting the same result with a real example. I think it is a bug. Using latest version pandas 0.19.2

@jreback Can this be reopened? I also get this problem.

I've commented this on #6618, which seems related. From what I can see, the problem is that pandas is reserving the first column of the header rows to the MultiIndex, and creates a new line for the row index name even if it is unnamed:

>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=pd.MultiIndex.from_tuples([('A'
,''),('B','C'),('B','D')]))
>>> df.to_excel("out.xlsx", index_label="Foo")

pandas_to_excel_bug2

This bug was fixed some time ago for simple columns, but it was probably not a very good solution, since it is still buggy for MultiIndex columns:

>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['A','B','C'])
>>> df.to_excel("out.xlsx", index_label="Foo")

pandas_to_excel_bug3

Version information:

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.1.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback still got the same problem for the multi-index, wish this issue can be reopened.

Same here, I will open a new issue and try to find the problem

@etiennecaldo Did you open the issue?
I'm having the same problem with the latest version of pandas.

@Ronkiro no I workarounded it on my side...!

This still appears to be an issue for multi-index columns (pandas 0.24.2) - is there any way this can be re-opened?

data = [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]]
multi_index = pd.MultiIndex.from_product([['a', 'b'], ['one', 'two', 'three']])
df = pd.DataFrame(data, columns=multi_index_one, index=[1, 2])

df.to_excel('test_files/test.xlsx')

Screenshot 2019-07-23 at 10 32 53

pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Darwin
OS-release: 18.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: en_IE.UTF-8
pandas: 0.24.2
pytest: 4.1.1
pip: 18.1
setuptools: 40.6.3
Cython: 0.28.2
numpy: 1.15.4
scipy: 1.1.0
pyarrow: 0.11.1
xarray: None
IPython: 7.2.0
sphinx: None
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: 1.1.8
lxml.etree: 4.3.0
bs4: 4.7.1
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: 0.2.0
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

I got the same issue with a dataset with multiple column and row indizes and would really appreciate a solution.

pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.0-25-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: 4.5.0
pip: 19.0.3
setuptools: 41.0.0
Cython: None
numpy: 1.15.2
scipy: None
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: 1.1.2
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.3.3
pymysql: 0.9.3
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Same issue with pandas 0.25.0

pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.24.2
pytest: None
pip: 19.0.3
setuptools: 40.8.0
Cython: None
numpy: 1.16.4
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.6.2
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: 0.2.3

Rather than replying to a feature request issue that was implemented and closed 6 years ago it would probably be best to open a new issue with an example(s) that demonstrate the issue.

For reference, this issue has been created and can be found at #27772 .

For any more comments, please post there.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Ashutosh-Srivastav picture Ashutosh-Srivastav  路  3Comments

MatzeB picture MatzeB  路  3Comments

mfmain picture mfmain  路  3Comments

BDannowitz picture BDannowitz  路  3Comments

scls19fr picture scls19fr  路  3Comments