[dev]rbuhr:~% python
Python 3.7.2 (default, Jul 24 2019, 19:27:42)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/__init__.py", line 55, in <module>
from pandas.core.api import (
File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/api.py", line 24, in <module>
from pandas.core.groupby import Grouper, NamedAgg
File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/groupby/__init__.py", line 1, in <module>
from pandas.core.groupby.generic import ( # noqa: F401
File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 44, in <module>
from pandas.core.frame import DataFrame
File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/frame.py", line 88, in <module>
from pandas.core.generic import NDFrame, _shared_docs
File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/generic.py", line 71, in <module>
from pandas.io.formats.format import DataFrameFormatter, format_percentiles
File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/io/formats/format.py", line 47, in <module>
from pandas.io.common import _expand_user, _stringify_path
File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/io/common.py", line 9, in <module>
import lzma
File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/lzma.py", line 27, in <module>
from _lzma import *
ModuleNotFoundError: No module named '_lzma'
>>>
After installing pandas 0.25.0, I can't import the library because of missing compression libraries. First it returned the error message ModuleNotFoundError: No module named '_bz2'
. I installed with sudo apt-get install libbz2-dev
and tried again to get the error message from the code sample above, ModuleNotFoundError: No module named '_lzma'
.
This was not an issue with the previous version of pandas and I tested by downgrading to pandas 0.24.0 and was able to import without the error messages. I feel like pandas should not prevent usage just because some _optional_ compression programs are not installed, like the default behavior of the last version.
>>> import pandas
>>>
pd.show_versions()
Unable to run because can't import pandas.
Please see the discussions in https://github.com/pandas-dev/pandas/issues/27532 and https://github.com/pandas-dev/pandas/issues/27543. You may have to reconfigure your environment.
I am surprised this issue keeps popping up for something in the stdlib though - conda and official distributions do always come with lzma right?
My stab in the dark guess is that this is a pipenv issue (based on the 3 issues) and somehow '_lzmamodule.c'
isn't getting built?
https://github.com/python/cpython/blob/b9a0376b0dedf16a2f82fa43d851119d1f7a2707/setup.py#L1558
Yea so here's a related discussion on bpo:
https://bugs.python.org/issue34895
So not a definitive response but I guess still implied that lzma is expected to be available as part of a standard Python distribution
I feel like the closing of this issue was not appropriate. The other two issues linked also have the same problem -- that pandas 0.25 assumes you have things installed that may not actually have come with python by default. This should be made explicitly clear up front _before installation completes_, not as an import error after installation.
I second @raybuhr 's comment. Pyenv is a project with 16k stars. It's very widely used.
I guess still implied that lzma is expected to be available as part of a standard Python distribution
I feel this is an incorrect assumption then. I've been using Pyenv successfully and never run into an issue with _lzma
until this release. It's not a nice experience that people should read through Stack Overflow and 3 closed (!) Pandas issue threads to figure out how to brew install xz
as a solution.
This should be made explicitly clear up front before installation completes, not as an import error after installation.
That's not possible, at least not with binary distributions (wheels / conda packages). Unless you're saying that there should be an error when you're compiling Python, in which case I agree (right now it's just a warning).
On the larger issue, I'm not sure what's best. Clearly this is affecting people. But at some point we need to be able to rely on importing module from the standard library, right?
And just to be clear, this isn't a pyenv issue. It's a problem on the user's machine not having the proper dependencies when Python is compiled.
Yea this is certainly unfortunate but quoting what I think is the most definitive response from the Python mailing list:
I agree that modules that are necessarily optional should be documented
as such, and as I mentioned on https://bugs.python.org/issue34895, many
are so documented.
In the absence of such documentation, I would considered it to be not
optional except as some distributor decides to omit it. But then it is
the responsibility of the distributor to document the omission.
https://mail.python.org/pipermail/python-ideas/2018-October/054089.html
So since Python doesn't document this library as optional it should be available and if not the responsibility of the distributor to handle that expectation
FWIW pyenv also documents this as the first step in their "Common Build Problems" page:
https://github.com/pyenv/pyenv/wiki/Common-build-problems
So perhaps could help them improve that aspect of the documentation if it isn't immediately obvious
I see the points made above about this probably being an issue with system level dependencies. I am in fact using pyenv to install and fixing for our team isn't particularly difficult.
Since python expects the compression libraries to be installed since the modules are part of the standard library, this probably doesn't have to be an issue for the pandas team. That said, I still feel like making the compression libraries prerequisites for using pandas as unnecessary overhead. I think a more sympathetic response would be to try importing the compression modules and return a message that they aren't installed while still allowing pandas to be imported and used, just without support for compression.
Pandas 0.25.0 is not useable with tools like kubeless as debian base images for Docker don't appear to contain the proper libs for _lzma any more. You'd need to build out custom images.
Pandas 0.24.2 works fine.
I suspect we would accept a PR that did the lzma import in a try / except ImportError block.
When the module is not present, we would emit a UserWarning
that their Python was not compiled properly and that lzma
compression is not available. And if they use lzma
compression we would raise at runtime.
Is anyone interested in submitting a PR?
FYI, we'll probably want to do the 0.25.1 release in 1-2 weeks. It'd be good to include this.
Any takers to work on this? No obligation of course. If no one else is able to, I'll put something together later in the week.
cc @islander, @selvathiruarul, @Salompas, @tvanyo who reported this in other issues.
If I remember correctly I was able to solve this issue by brew install xz
, and then reinstalling Python with pyenv
. It makes sense that this should not be a pandas issue, but it would be nice to have a warning message or something else to fall back on (while alerting the user that it needs to be solved).
I'll give it a try (but I am a noob, so let me know if there are issues with the code).
Thanks @Salompas. Feel free to start something if you think you have a handle on what needs to be done.
We're deciding a release date for 0.25.1 at our dev meeting on Wednesday. If necessary, one of us will step in and finish things off before we need to release.
@TomAugspurger I am following your suggestions and will submit a PR soon (just need a bit of time to go through the "Contributing to pandas" page).
I suspect we would accept a PR that did the lzma import in a try / except ImportError block.
When the module is not present, we would emit a
UserWarning
that their Python was not compiled properly and thatlzma
compression is not available. And if they uselzma
compression we would raise at runtime.Is anyone interested in submitting a PR?
@TomAugspurger I have been trying to modify the code, but ran into a problem. One of the first files to complain about a missing lzma
module is pandas/_libs/parsers.pyx
. Unfortunately, my knowledge of Cython is zero. I have tried a couple of things but without much success (cythonize
fails on parsers.pyx
when doing pip install -e .
). If you have any ideas, please let me know!
@TomAugspurger I have been trying to modify the code, but ran into a problem. One of the first files to complain about a missing
lzma
module ispandas/_libs/parsers.pyx
. Unfortunately, my knowledge of Cython is zero. I have tried a couple of things but without much success (cythonize
fails onparsers.pyx
when doingpip install -e .
). If you have any ideas, please let me know!
The issue with cythonize
might be due to something else. I have tried it adding a blank line to the pyx
file and reinstalling with pip
and I get the same error.
Fortunately in this case, you can just use regular Python in parsers.pyx
. I'd recommend writing a compat function like the following, and replacing import lzma
with that.
diff --git a/pandas/_libs/parsers.pyx b/pandas/_libs/parsers.pyx
index cafc31dad..385349629 100644
--- a/pandas/_libs/parsers.pyx
+++ b/pandas/_libs/parsers.pyx
@@ -2,7 +2,6 @@
# See LICENSE for the license
import bz2
import gzip
-import lzma
import os
import sys
import time
@@ -59,9 +58,12 @@ from pandas.core.arrays import Categorical
from pandas.core.dtypes.concat import union_categoricals
import pandas.io.common as icom
+from pandas.compat import import_lzma
from pandas.errors import (ParserError, DtypeWarning,
EmptyDataError, ParserWarning)
+lzma = import_lzma()
+
# Import CParserError as alias of ParserError for backwards compatibility.
# Ultimately, we want to remove this import. See gh-12665 and gh-14479.
CParserError = ParserError
diff --git a/pandas/compat/__init__.py b/pandas/compat/__init__.py
index 5ecd641fc..04e8d44a3 100644
--- a/pandas/compat/__init__.py
+++ b/pandas/compat/__init__.py
@@ -65,3 +65,17 @@ def is_platform_mac():
def is_platform_32bit():
return struct.calcsize("P") * 8 < 64
+
+
+def import_lzma():
+ import warnings
+
+ try:
+ import lzma
+ return lzma
+ except ImportError:
+ msg = (
+ "Could not import the lzma module. Your installed Python is incomplete. "
+ "Attempting to use `lzma` compression will result in a RuntimeError."
+ )
+ warnings.warn(msg)
diff --git a/pandas/io/common.py b/pandas/io/common.py
index e01e47304..0a66c58b8 100644
--- a/pandas/io/common.py
+++ b/pandas/io/common.py
@@ -6,7 +6,6 @@ import csv
import gzip
from http.client import HTTPException # noqa
from io import BytesIO
-import lzma
import mmap
import os
import pathlib
@@ -31,10 +30,12 @@ from pandas.errors import ( # noqa
ParserWarning,
)
+from pandas.compat import import_lzma
from pandas.core.dtypes.common import is_file_like
from pandas._typing import FilePathOrBuffer
+lzma = import_lzma()
# gh-12665: Alias for now and remove later.
CParserError = ParserError
Then going through and fixing up uses of lzma to check for lzma
being None.
LMK if you want me to take over. We can always find other issues for you to work on 😄 This is getting to be a bit tricky (which is why it'd be nice to rely on lzma just being present!)
@TomAugspurger cool idea! I am trying that right now, thanks for the hint!
ModuleNotFoundError: No module named '_lzma':
Oh,shit! This problem killed my whole day!
0.25.0 has this error however 0.24.2 is OK!
I rollback 0.24.2 version.
However problems is lacking like _lzma.cpython-36m-darwin.so file in lib_dynload directory.
Maybe, I need to recompiled。
Most helpful comment
I feel like the closing of this issue was not appropriate. The other two issues linked also have the same problem -- that pandas 0.25 assumes you have things installed that may not actually have come with python by default. This should be made explicitly clear up front _before installation completes_, not as an import error after installation.