Pandas: Better Message for xlrd Dependencies

Created on 20 Sep 2019  路  23Comments  路  Source: pandas-dev/pandas

Right now if you don't have xlrd installed and use read_excel without specifying the engine keyword you get the following message:

>>> pd.read_excel("test.xlsx")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/williamayd/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/util/_decorators.py", line 208, in wrapper
    return func(*args, **kwargs)
  File "/Users/williamayd/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 310, in read_excel
    io = ExcelFile(io, engine=engine)
  File "/Users/williamayd/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 819, in __init__
    self._reader = self._engines[engine](self._io)
  File "/Users/williamayd/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", line 20, in __init__
    import_optional_dependency("xlrd", extra=err_msg)
  File "/Users/williamayd/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/compat/_optional.py", line 93, in import_optional_dependency
    raise ImportError(message.format(name=name, extra=extra)) from None
ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd.

This is in spite of the fact that the user may have openpyxl installed and could do something like pd.read_excel(..., engine="openpyxl") to get the same code to work

Two issues need to be addressed here:

  • The default read_excel call with no engine argument should fall back to openpyxl, if installed
  • The default error message should direct the user to install openpyxl first and foremost, as xlrd is unmaintained
Enhancement Error Reporting IO Excel good first issue

Most helpful comment

I'm still getting the same issue as above. Running an anaconda environment with pandas=1.3.0 and xlrd=1.2.0

Using pd.read_excel returns "ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd."

All 23 comments

Can I work on this?

Sure!

Can I work on this? I am new to contributing to issues on GitHub and this seems like a good start.

I removed xlrd from versions and changed the default message to only include the pip install message like follows.

VERSIONS = {
"bs4": "4.6.0",
"bottleneck": "1.2.1",
"fastparquet": "0.2.1",
"gcsfs": "0.2.2",
"lxml.etree": "3.8.0",
"matplotlib": "2.2.2",
"numexpr": "2.6.2",
"odfpy": "1.3.0",
"openpyxl": "2.4.8",
"pandas_gbq": "0.8.0",
"pyarrow": "0.9.0",
"pytables": "3.4.2",
"s3fs": "0.0.8",
"scipy": "0.19.0",
"sqlalchemy": "1.1.4",
"tables": "3.4.2",
"xarray": "0.8.2",
"xlwt": "1.2.0",
"xlsxwriter": "0.9.8",
}
message = (
"Use pip or conda to install {name}."
)

Is what I am doing correct?

I changed the default engine to be openpyxl in pandas/io/excel/base to be openpyxl for the purpose of making the "default read_excel call with no engine argument" fall back to openpyxl.

Also I added an error message to pandas/io/excel/_openpyxl for the purpose of directing "the user to install openpyxl first and foremost,"

Am I heading in the right direction? Is this OK?

I changed the default engine to be openpyxl in pandas/io/excel/base to be openpyxl for the purpose of making the "default read_excel call with no engine argument" fall back to openpyxl

Don't want to make that change just yet - simply warn the user that the default will change in the future and they can explicitly say engine="openpyxl" to suppress or simply wait for the swap to be made in a future release

On it! thanks for the feedback

take

I'd like to work on this but this is my first time contributing to pandas as well as first time working with such a huge codebase. What general direction should i work in?

@tab1tha / @SuvigyaJain1 are you still working on this? @WillAyd is this just a matter of modifying the 'read_excel' function and changing the default engine to openpyxl? Are there any checks that need to be done to see if the module is already installed?

No I'm no longer working on this feel free to take the issue

I think this can be handled by #29375 which @cruzzoe was working on but I think stalled; if that鈥檚 the case then certainly would welcome you taking over

@WillAyd I took a look at the PR. I thought this would be a relatively easy fix by just adding a FutureWarning ExcelFile when using xlrd and/or just defaulting to openpyxl but looking at the reviews it seems to be more complicated than I thought.

In my own fork, I just added that warning which works as expected when not specifying the engine. This would be my first contribution so I can use some guidance.

I don鈥檛 think the existing PR is too far off just needs to be pushed over the finish line. If you pull that branch locally, fix the merge conflicts, and add a filter for the FutureWarning that is now getting raised at the module level of test_readers.py and test_xlrd.py I think should get you most of the way there:

https://docs.pytest.org/en/latest/warnings.html#pytest-mark-filterwarnings

So... I do have this error and I have xlrd installed. I tried to update xlrd and force-reinstall with pip and it doesn't work. I get ImportError: Install xlrd >= 1.0.0 for Excel support.
I have a virtualenv with python3.7.5, pandas==1.03 and xlrd==1.2.0.
Any ideas where the error might come from ?

Having the exact same problem. Wasn't happening on a different version of jupyter yesterday but now it is and I can't seem to use pip install.

I think that for some reason the version of python called from that command is not the version on use: My system has python 2.7 as standard, and the virtual environment I'm using is 3.7. The errors were calling python2.7. So maybe someone hardcoded something like #! /bin/python somewhere instead of #! /usr/bin/env python ??

Yep. That鈥檚 what it was. Using pip3 to install and also installing openpyxl worked for me.

I have the same problem, I removed xlrd and delete all xlrd folder in C:\Users\anaconda3\Lib\site-packages
and reinstall with conda install -c anaconda xlrd
it worked for me.

Yep. That鈥檚 what it was. Using pip3 to install and also installing openpyxl worked for me.

I am new to python. What did you install using pip3? could you please explain more thoroughly? Thanks.

When you are in a virtual environment of python3, pip is the same as pip3.
Probably the error is how xlrd is being handled during installation. The comments are the outputs of the command

source  path/to/your/env/bin/activate

command -v pip
#  path/to/your/env/bin/pip
command -v pip3
#  path/to/your/env/bin/pip3

ls -l path/to/your/env/bin/pip*
# -rwxrwxr-x 1 user group 242 Feb  7 08:50 /home/user/path/to/your/env/bin/pip*
# -rwxrwxr-x 1 user group 242 Feb  7 08:50 /home/user/path/to/your/env/bin/pip3*
# -rwxrwxr-x 1 user group 242 Feb  7 08:50 /home/user/path/to/your/env/bin/pip3.6*

md5sum /home/user/path/to/your/env/bin/pip*
# e85ad2c43787183884634c694a4f9c15  /home/user/path/to/your/env/bin/pip
# e85ad2c43787183884634c694a4f9c15  /home/user/path/to/your/env/bin/pip3
# e85ad2c43787183884634c694a4f9c15  /home/user/path/to/your/env/bin/pip3.6

I'm still getting the same issue as above. Running an anaconda environment with pandas=1.3.0 and xlrd=1.2.0

Using pd.read_excel returns "ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd."

If you are using conda and you installed the requirements but are still getting the same error message:

ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd

It might be the case that you are getting this error only in jupyter/jupyter-lab. If that is the case, the following solved the issue for me:

conda install -c anaconda ipykernel
python -m ipykernel install --user --name=[NAME_OF_YOUR_ENV]
Was this page helpful?
0 / 5 - 0 ratings