Pandas: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb with read_excel function

Created on 2 Jan 2017  ·  5Comments  ·  Source: pandas-dev/pandas

import pandas as pd

data = pd.read_excel(open("2016-07-12_Zuwendungsbericht_2015_OpenData.xlsx"), sheetname="Zuwendungsbericht", encoding="utf-8")

I'm getting this error when I try to read this excel file: http://transparenz.bremen.de/sixcms/media.php/13/2016-07-12_Zuwendungsbericht_2015_OpenData.xlsx .

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 14: invalid start byte

INSTALLED VERSIONS

commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-57-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: de_DE.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 28.8.0
Cython: None
numpy: 1.11.3
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
None

IO Excel Usage Question

Most helpful comment

@GiantCrocodile To clarify a bit: an xlsx file is a binary file, while open will try to read it as a text file and pass this on to read_excel, hence this fails to read it. If you want to use open (which is not needed in this case, as pandas automatically opens the file for you), you can do open(path, mode='rb').

All 5 comments

This works fine for me, even without specifying the encoding:

In [11]: pd.read_excel('http://transparenz.bremen.de/sixcms/media.php/13/2016-07-12_Zuwendungsbericht_2015_OpenData.xlsx', sheetname="Zuwendungsbericht").head()
Out[11]: 
              Ressort                                Zuwendungsempfänger  \
0                 NaN                                                NaN   
1  03 - Senatskanzlei  acompa. – Begleitungsgruppe für Flüchtlinge un...   
2  03 - Senatskanzlei  Aktion Kultur und Freizeit Huchting und Grolla...   
3  03 - Senatskanzlei  Aktion Kultur und Freizeit Huchting und Grolla...   
4  03 - Senatskanzlei  Aktive Menschen Bremen eingetragener Verein (A...   

  Haushalts-stelle(n)                                    Zuwendungszweck  \
0                 NaN                                                NaN   
1        3020.68400-2                      Begleitung von Behördengängen   
2        3020.68400-2      Sofortprogramm Flüchtlinge in den Stadtteilen   
3        3020.68400-2  Einkaufsfahrten, Begleitung, Dolmetscherdienst...   
4        3020.68400-2      Sofortprogramm Flüchtlinge in den Stadtteilen   

   Institutionelle Zuwendungen Bremens  Unnamed: 5 Unnamed: 6  \
0                               2014.0      2015.0  Veränd. %   
1                                  0.0         0.0        NaN   
2                                  0.0         0.0        NaN   
3                                  0.0         0.0        NaN   
4                                  0.0         0.0        NaN   

   Projekt-förderungen Bremens  Unnamed: 8 Unnamed: 9  \
0                       2014.0      2015.0  Veränd. %   
1                        300.0         0.0       -100   
2                          0.0       800.0        NaN   
3                        750.0         0.0       -100   
4                          0.0       500.0        NaN   

   institutionelle Förderung / Projektförderung Dritter  Unnamed: 11  \
0                                             2014.0          2015.0   
1                                                0.0             0.0   
2                                                0.0             0.0   
3                                                0.0             0.0   
4                                                0.0             0.0   

  Unnamed: 12 Finan\nzie-\nrungs\nart       Stadtteil  
0   Veränd. %                     NaN             NaN  
1           0                      FB        Neustadt  
2           0                      FB        Huchting  
3           0                      FB        Huchting  
4           0                       V  Woltmershausen  

So you can leave out the open, and then it should work fine by default.

@GiantCrocodile To clarify a bit: an xlsx file is a binary file, while open will try to read it as a text file and pass this on to read_excel, hence this fails to read it. If you want to use open (which is not needed in this case, as pandas automatically opens the file for you), you can do open(path, mode='rb').

Thanks @jorisvandenbossche for helping me! What I've done was posted on StackOverflow as a solution. Obviously it is false - now it does work with what you've said!

A quick question: if I want to add encoding specification, should I do it like I've done it in my example? If yes I'm curious why the encoding parameter isn't mentioned in docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html .

I am not familiar enough with excel to know how its encoding works, but remember that it is not a 'normal' text file, so in most cases you won't need to specify it. Specifically for read_excel, the encoding parameter is not passed through to the actual reading of the excel file, but only for parsing afterwards (kwds here: https://github.com/pandas-dev/pandas/blob/9947a99d04b87e7441f062d6b5eca281ef15deb7/pandas/io/excel.py#L513).
But see eg http://xlrd.readthedocs.io/en/latest/unicode.html. xlrd is the library used for reading the excel files. It seems it has a encoding_override keyword, but this is not supported by read_excel at the moment I think.

Do you have an example where you need to specify the encoding?

I don't have an example where I need to specify the encoding. I've just started using Pandas and I'm used to specify the encoding so I tried it here but it doesn't matter to me if it just works without specifying it.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nathanielatom picture nathanielatom  ·  3Comments

tade0726 picture tade0726  ·  3Comments

matthiasroder picture matthiasroder  ·  3Comments

MatzeB picture MatzeB  ·  3Comments

ebran picture ebran  ·  3Comments