Pandas: Issue on Pandas installation on Alpine:3.10 docker container

Created on 9 Dec 2019  Â·  16Comments  Â·  Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

Collecting pandas
  Downloading https://files.pythonhosted.org/packages/b7/93/b544dd08092b457d88e10fc1e0989d9397fd32ca936fdfcbb2584178dd2b/pandas-0.25.3.tar.gz (12.6MB)

 ERROR: Command errored out with exit status 1:
     command: /usr/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-u_c9l9_o/pandas/setup.py'"'"'; __file__='"'"'/tmp/pip-install-u_c9l9_o/pandas/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-u_c9l9_o/pandas/pip-egg-info
         cwd: /tmp/pip-install-u_c9l9_o/pandas/
    Complete output (15 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-u_c9l9_o/pandas/setup.py", line 815, in <module>
        ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
      File "/tmp/pip-install-u_c9l9_o/pandas/setup.py", line 537, in maybe_cythonize
        numpy_incl = pkg_resources.resource_filename("numpy", "core/include")
      File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1144, in resource_filename
        return get_provider(package_or_requirement).get_resource_filename(
      File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 364, in get_provider
        return _find_adapter(_provider_factories, loader)(module)
      File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1392, in __init__
        self.module_path = os.path.dirname(getattr(module, '__file__', ''))
      File "/usr/lib/python3.7/posixpath.py", line 156, in dirname
        p = os.fspath(p)
    TypeError: expected str, bytes or os.PathLike object, not NoneType
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.


Problem description

Version specification in use.
Pandas:- 0.25.3
Numpy :- 1.17.4
matplotlib:- 3.1.2
Cython:- 0.29.14
Docker Alpine:- 3.10
OS:- MacOs catalina 10.15.1

Hey Folks,
I am having problem with installation of pandas library on alpine docker container.
As error provided in code section, during docker build it failing over pandas installation.

Any help would be appreciated.

Most helpful comment

Seeing a similar problem installing pandas on alpine 10, here's a reproducing one-liner:

$ docker run --rm -it python:3.7.4-alpine3.10 sh -c "apk add gcc linux-headers musl-dev && pip3 install pandas==0.25.3"

Results in: ModuleNotFoundError: No module named 'Cython'

Seems that pandas dependencies are broken.

If you install Cython first:

$ docker run --rm -it python:3.7.4-alpine3.10 sh -c "apk add gcc linux-headers musl-dev && pip3 install Cython && pip3 install pandas==0.25.3"

It errors out that Numpy isn't around... ModuleNotFoundError: No module named 'numpy'...

All 16 comments

Do you have the necessary compilers? PyPI doesn't have a manylinux tag for alpine, so we can't upload wheels for it.

Seeing a similar problem installing pandas on alpine 10, here's a reproducing one-liner:

$ docker run --rm -it python:3.7.4-alpine3.10 sh -c "apk add gcc linux-headers musl-dev && pip3 install pandas==0.25.3"

Results in: ModuleNotFoundError: No module named 'Cython'

Seems that pandas dependencies are broken.

If you install Cython first:

$ docker run --rm -it python:3.7.4-alpine3.10 sh -c "apk add gcc linux-headers musl-dev && pip3 install Cython && pip3 install pandas==0.25.3"

It errors out that Numpy isn't around... ModuleNotFoundError: No module named 'numpy'...

What version of pip, can can you try installing from GitHub?

On master, we have a pyproject.toml, which will get the build-dependencies (NumPy, Cython) before building pandas.

Using python:3.7.4-alpine3.10

docker run --rm -it python:3.7.4-alpine3.10 sh -c "pip3 --version"
pip 19.3 from /usr/local/lib/python3.7/site-packages/pip (python 3.7)

Using python:3.7.5-alpine.3.10 (same issues)

$ docker run --rm -it python:3.7.5-alpine3.10 sh -c "pip3 --version"                                                   
pip 19.3.1 from /usr/local/lib/python3.7/site-packages/pip (python 3.7)

FWIW: this all used to work a few days ago.

Perhaps something changed on the NumPy side of things then? Pandas hasn't had a release recently, and the traceback indicates the exception is in getting NumPy's resources.

        numpy_incl = pkg_resources.resource_filename("numpy", "core/include")
      File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1144, in resource_filename
        return get_provider(package_or_requirement).get_resource_filename(
      File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 364, in get_provider
        return _find_adapter(_provider_factories, loader)(module)
      File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1392, in __init__
        self.module_path = os.path.dirname(getattr(module, '__file__', ''))
      File "/usr/lib/python3.7/posixpath.py", line 156, in dirname
        p = os.fspath(p)
    TypeError: expected str, bytes or os.PathLike object, not NoneType

@TomAugspurger Pip version 19.3.1

Closing since issue is with alpine libraries. @TomAugspurger @bgehman Thanks mate, Appreciate your help.

@jitendrs Is it working for you? I'm still seeing the broken dependencies when installing pandas as mentioned at https://github.com/pandas-dev/pandas/issues/30154#issuecomment-563269458 . If you have a solution, mind sharing it?

@bgehman Try to install below library manually,

RUN apk add
lapack-dev \
libgcc \
libquadmath \
musl \
libgfortran

@jitendrs My one-liner, after adding your list of libraries:

$ docker run --rm -it python:3.7.4-alpine3.10 sh -c "apk add gcc linux-headers musl-dev lapack-dev libgcc libquadmath musl libgfortran && pip3 install pandas==0.25.3"

Still results in ModuleNotFoundError: No module named 'Cython'.

@TomAugspurger let me know if you want me to create a new issue. The dependency problem still remains.

Oops, sorry forgot to give you add on,

This are my docker contents.

RUN echo '***** Install dependencies'
    RUN apk upgrade --update  && \
    apk add  build-base \
        --virtual build-dependencies \
        bash \
        git \
        wget \
        emacs \
        vim \
        curl \
        make \
        lapack-dev \
        libgcc \
        libquadmath \
        musl \
        libgfortran \
        cmake \
        gcc \
        libxml2-dev  \
        libxslt-dev \
       g++ \
       libxml2 

# Install python3
RUN echo "***** install python and python dependencies"
RUN apk add python3-dev \
        python3-tkinter \
    &&  python3 -m pip install --upgrade pip \
    &&  python3 -m pip install cython numpy scipy  pandas

make sure you follow the sequence.

Try installing pandas master pip3 install git+ https://github.com/pandas-dev/pandas. If there are still issues on master,
we can reopen.

For 0.25 and earlier you have to have your build env sorted out before
installing pandas if you're building from source.

On Tue, Dec 10, 2019 at 8:33 AM jitendrs notifications@github.com wrote:

Oops, sorry forgot to give you add on,
python3 -m pip install cython numpy scipy
make sure you follow the sequence.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/30154?email_source=notifications&email_token=AAKAOIRSJSYQHJEUOQBSQD3QX6SEHA5CNFSM4JYHHSYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGPN55Q#issuecomment-564059894,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAKAOIUQ6XCTF7RNHREL34LQX6SEHANCNFSM4JYHHSYA
.

@jitendrs I see that you have:

&&  python3 -m pip install cython numpy scipy  pandas

if the dependencies were working, you should only have to install pandas (and not explicitly install pandas dependencies: cython, numpy, etc).

@TomAugspurger installing from master works:

$ docker run --rm -it python:3.7.4-alpine3.10 sh -c "apk add gcc g++ linux-headers musl-dev git && pip3 install --upgrade pip && pip3 install git+https://github.com/pandas-dev/pandas"

What I can gleam that does not work:

  • installing pandas from pypi by itself. It seems that pip does figure out that numpy is a dependency, but the numpy install fails on: ModuleNotFoundError: No module named 'Cython'
  • install cython first, and then try to install pandas: cython installs fine, and then the pandas install fails on: ModuleNotFoundError: No module named 'numpy'. I can not figure out why pip has different dependency resolution depending on whether cython is installed already or not -- but there is a lot of complexity in this area in panda's setup.py

What does work, install each of these independently and in order (each on their own pip install) -- if combining in to one pip-install, it'll also fail:

  1. cython
  2. numpy
  3. pandas

As in: $ docker run --rm -it python:3.7.4-alpine3.10 sh -c "apk add gcc g++ linux-headers musl-dev && pip3 install cython && pip3 install numpy && pip3 install pandas"

What the mystery is, is this all used to work fine last week (and the prior year). I don't know what changed that broke this -- I guess it could be pip, or numpy, or pandas... It isn't the python/alpine docker image as per DockerHub that tag hasn't changed in months.

Anyhow, the problem should be easily reproducible by anybody. It takes far too long compiling numpy/pandas from source to make it worthwhile to use alpine (since there is no whl for numpy/pandas for alpine). Preferable for people to use python-slim and save themselves the massive amount of time trying compile this from source and dealing with the broken dependency resolution (docker run --rm -it python:3.7.4-slim sh -c "pip3 install pandas").

installing pandas from pypi by itself.

Just to clarify, this is fixed on pandas master because we use a pyproject.toml. Pandas 1.0 (should be released in January) will have this, and you'll be able to pip install pandas from PyPI and it'll handle the build dependency stuff.

@TomAugspurger Cheers..!!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

matthiasroder picture matthiasroder  Â·  3Comments

ebran picture ebran  Â·  3Comments

scls19fr picture scls19fr  Â·  3Comments

andreas-thomik picture andreas-thomik  Â·  3Comments

marcelnem picture marcelnem  Â·  3Comments