Pandas: multiindex column in to_excel

Created on 15 Jan 2013 · 18Comments · Source: pandas-dev/pandas

link to #1651

Saving a multiindex column to_excel saves a sparse index.

Migrated from this StackOverflow question, with a smaller DataFrame.

In [1]: m = MultiIndex.from_tuples([(1,1),(1,2)], names=['a','b'])

In [2]: df = DataFrame([[1,2],[3,4]], columns=m)

In [3]: df
Out[3]: 
a  1   
b  1  2
0  1  2
1  3  4

In [4]: df.to_excel('test.xls')

Saves the xls:

    1.1 .2
0   1   2
1   3   4

This differs from how to_csv (which is not sparse):

,"(1L, 1L)","(1L, 2L)"
0,1,2
1,3,4

Enhancement IO Data IO Excel Output-Formatting

Source

hayd

Most helpful comment

@jreback Can this be reopened? I also get this problem.

I've commented this on #6618, which seems related. From what I can see, the problem is that pandas is reserving the first column of the header rows to the MultiIndex, and creates a new line for the row index name even if it is unnamed:

>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=pd.MultiIndex.from_tuples([('A'
,''),('B','C'),('B','D')]))
>>> df.to_excel("out.xlsx", index_label="Foo")

pandas_to_excel_bug2

This bug was fixed some time ago for simple columns, but it was probably not a very good solution, since it is still buggy for MultiIndex columns:

>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['A','B','C'])
>>> df.to_excel("out.xlsx", index_label="Foo")

pandas_to_excel_bug3

Version information:

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.1.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

ronanpaixao on 25 Oct 2017

👍5

All 18 comments

Both to_excel and to_csv should allow a multiindex on the columns to print on multiple rows

joeb1415 on 22 Mar 2013

I guess then read_csv would have to take a list for the header argument to reconstruct the mi

jreback on 22 Mar 2013

2478, this is already hiding in the codebase, just needs some TLC.

ghost on 22 Mar 2013

@hayd @jtratner IIRC this should still be open, right?

jreback on 21 Sep 2013

yes, definitely.

jtratner on 21 Sep 2013

I think that has been fixed by #5423:

import pandas as pd

m = pd.MultiIndex.from_tuples([(1,1),(1,2)], names=['a','b'])
df = pd.DataFrame([[1,2],[3,4]], columns=m)

df.to_excel('test.xls')

Output:

screenshot

Can we close this?

jmcnamara on 10 Nov 2013

Hi all,

I'm trying to write a multiindex dataframe to Excel using the example above, and not getting the same results.

Code:

import pandas as pd

m = pd.MultiIndex.from_tuples([(1,1),(1,2)], names=['a','b'])
df = pd.DataFrame([[1,2],[3,4]], columns=m)

df.to_excel('test.xls')

Output:
capture

There's an extra line being added-- different than the output generated above.

I'm using pandas version 0.19.2 on Ubuntu 14.04 and have also experienced this on Windows 10. I've tried this using pandas version 0.15.0 and it works properly, replicating the output above:

Is this a bug? And if so, can we re-open this issue?

Thanks,

Charlie

mappingvermont on 27 Apr 2017

@mappingvermont I'm getting the same result with a real example. I think it is a bug. Using latest version pandas 0.19.2

jd901215 on 18 Jul 2017

👍3

@jreback Can this be reopened? I also get this problem.

>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=pd.MultiIndex.from_tuples([('A'
,''),('B','C'),('B','D')]))
>>> df.to_excel("out.xlsx", index_label="Foo")

pandas_to_excel_bug2

This bug was fixed some time ago for simple columns, but it was probably not a very good solution, since it is still buggy for MultiIndex columns:

>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['A','B','C'])
>>> df.to_excel("out.xlsx", index_label="Foo")

pandas_to_excel_bug3

Version information:

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.1.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

ronanpaixao on 25 Oct 2017

👍5

@jreback still got the same problem for the multi-index, wish this issue can be reopened.

Tangjiahui26 on 14 Sep 2018

👍2

Same here, I will open a new issue and try to find the problem

etiennecaldo on 9 May 2019

@etiennecaldo Did you open the issue?
I'm having the same problem with the latest version of pandas.

Ronkiro on 6 Jun 2019

@Ronkiro no I workarounded it on my side...!

etiennecaldo on 11 Jun 2019

This still appears to be an issue for multi-index columns (pandas 0.24.2) - is there any way this can be re-opened?

data = [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]]
multi_index = pd.MultiIndex.from_product([['a', 'b'], ['one', 'two', 'three']])
df = pd.DataFrame(data, columns=multi_index_one, index=[1, 2])

df.to_excel('test_files/test.xlsx')

Screenshot 2019-07-23 at 10 32 53

pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Darwin
OS-release: 18.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: en_IE.UTF-8
pandas: 0.24.2
pytest: 4.1.1
pip: 18.1
setuptools: 40.6.3
Cython: 0.28.2
numpy: 1.15.4
scipy: 1.1.0
pyarrow: 0.11.1
xarray: None
IPython: 7.2.0
sphinx: None
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: 1.1.8
lxml.etree: 4.3.0
bs4: 4.7.1
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: 0.2.0
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

ferdiaoh on 23 Jul 2019

I got the same issue with a dataset with multiple column and row indizes and would really appreciate a solution.

pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.0-25-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: 4.5.0
pip: 19.0.3
setuptools: 41.0.0
Cython: None
numpy: 1.15.2
scipy: None
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: 1.1.2
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.3.3
pymysql: 0.9.3
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Cattes on 23 Jul 2019

Same issue with pandas 0.25.0

pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.24.2
pytest: None
pip: 19.0.3
setuptools: 40.8.0
Cython: None
numpy: 1.16.4
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.6.2
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: 0.2.3

stefanopassador-bip on 5 Aug 2019

Rather than replying to a feature request issue that was implemented and closed 6 years ago it would probably be best to open a new issue with an example(s) that demonstrate the issue.

jmcnamara on 6 Aug 2019

For reference, this issue has been created and can be found at #27772 .

For any more comments, please post there.

Ronkiro on 6 Aug 2019

Was this page helpful?

0 / 5 - 0 ratings