Pip: Pip 20.2 causes UnicodeDecodeError

Created on 30 Jul 2020  ·  15Comments  ·  Source: pypa/pip

Environment

  • pip version: 20.2
  • Python version: 2.7.5
  • OS: CentOS 7.8.2003

Description

Installing some(?) packages fails with a UnicodeDecodeError exception.

Expected behavior

I expect the packages to be installed successfully like it does with 20.1

How to Reproduce

Build the following Dockerfile:

FROM centos:7

RUN yum -y install python-virtualenv
RUN virtualenv --python=python2 env
RUN ./env/bin/pip install -U pip==20.2 setuptools==44.1.1 wheel==0.34.2
RUN ./env/bin/pip install ansible==2.7.9
$ docker build -t pip-bug .

Output

$ docker build -t pip-bug .
Sending build context to Docker daemon  2.048kB
Step 1/5 : FROM centos:7
 ---> b5b4d78bc90c
Step 2/5 : RUN yum -y install python-virtualenv
 ---> Using cache
 ---> 809b2152d1a1
Step 3/5 : RUN virtualenv --python=python2 env
 ---> Using cache
 ---> 69dd1f9b3061
Step 4/5 : RUN ./env/bin/pip install -U pip==20.2 setuptools==44.1.1 wheel==0.34.2
 ---> Using cache
 ---> 4cfb8184c74b
Step 5/5 : RUN ./env/bin/pip install ansible==2.7.9
 ---> Running in bc33d8f35d95
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
Collecting ansible==2.7.9
  Downloading ansible-2.7.9.tar.gz (11.8 MB)
Collecting jinja2
  Downloading Jinja2-2.11.2-py2.py3-none-any.whl (125 kB)
Collecting PyYAML
  Downloading PyYAML-5.3.1.tar.gz (269 kB)
Collecting paramiko
  Downloading paramiko-2.7.1-py2.py3-none-any.whl (206 kB)
Collecting cryptography
  Downloading cryptography-3.0-cp27-cp27mu-manylinux2010_x86_64.whl (2.7 MB)
Requirement already satisfied: setuptools in /env/lib/python2.7/site-packages (from ansible==2.7.9) (44.1.1)
Collecting MarkupSafe>=0.23
  Downloading MarkupSafe-1.1.1-cp27-cp27mu-manylinux1_x86_64.whl (24 kB)
Collecting pynacl>=1.0.1
  Downloading PyNaCl-1.4.0-cp27-cp27mu-manylinux1_x86_64.whl (964 kB)
Collecting bcrypt>=3.1.3
  Downloading bcrypt-3.1.7-cp27-cp27mu-manylinux1_x86_64.whl (59 kB)
Collecting six>=1.4.1
  Downloading six-1.15.0-py2.py3-none-any.whl (10 kB)
Collecting cffi!=1.11.3,>=1.8
  Downloading cffi-1.14.1-cp27-cp27mu-manylinux1_x86_64.whl (388 kB)
Collecting ipaddress; python_version < "3"
  Downloading ipaddress-1.0.23-py2.py3-none-any.whl (18 kB)
Collecting enum34; python_version < "3"
  Downloading enum34-1.1.10-py2-none-any.whl (11 kB)
Collecting pycparser
  Downloading pycparser-2.20-py2.py3-none-any.whl (112 kB)
Building wheels for collected packages: ansible, PyYAML
  Building wheel for ansible (setup.py): started
  Building wheel for ansible (setup.py): finished with status 'done'
  Created wheel for ansible: filename=ansible-2.7.9-py2-none-any.whl size=9428759 sha256=9eb07c04d0a932220c25ea55fa20f5d52c10f8fd96914493bc0df59676c21891
  Stored in directory: /root/.cache/pip/wheels/bf/17/e3/c01fddfaa1e22530097c1b1933eecd8d1ddf5477f9e588006a
  Building wheel for PyYAML (setup.py): started
  Building wheel for PyYAML (setup.py): finished with status 'done'
  Created wheel for PyYAML: filename=PyYAML-5.3.1-cp27-cp27mu-linux_x86_64.whl size=45644 sha256=c50f35b17810fd226067a47388cdf0ec2decc00389c7cf39f5bbef02755857da
  Stored in directory: /root/.cache/pip/wheels/d1/d5/a0/3c27cdc8b0209c5fc1385afeee936cf8a71e13d885388b4be2
Successfully built ansible PyYAML
Installing collected packages: MarkupSafe, jinja2, PyYAML, six, pycparser, cffi, ipaddress, enum34, cryptography, pynacl, bcrypt, paramiko, ansible
Successfully installed MarkupSafe-1.1.1 PyYAML-5.3.1 ansible-2.7.9 bcrypt-3.1.7 cffi-1.14.1 cryptography-3.0 enum34-1.1.10 ipaddress-1.0.23 jinja2-2.11.2 paramiko-2.7.1 pycparser-2.20 pynacl-1.4.0 six-1.15.0
Traceback (most recent call last):
  File "./env/bin/pip", line 11, in <module>
    sys.exit(main())
  File "/env/lib/python2.7/site-packages/pip/_internal/cli/main.py", line 75, in main
    return command.main(cmd_args)
  File "/env/lib/python2.7/site-packages/pip/_internal/cli/base_command.py", line 121, in main
    return self._main(args)
  File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/env/lib/python2.7/site-packages/pip/_internal/cli/command_context.py", line 28, in main_context
    yield
  File "/env/lib/python2.7/site-packages/pip/_vendor/contextlib2.py", line 479, in __exit__
    _reraise_with_existing_context(exc_details)
  File "/env/lib/python2.7/site-packages/pip/_vendor/contextlib2.py", line 353, in _reraise_with_existing_context
    exec("raise exc_type, exc_value, exc_tb")
  File "/env/lib/python2.7/site-packages/pip/_vendor/contextlib2.py", line 468, in __exit__
    if cb(*exc_details):
  File "/env/lib/python2.7/site-packages/pip/_vendor/contextlib2.py", line 396, in _exit_wrapper
    return cm_exit(cm, *exc_details)
  File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/env/lib/python2.7/site-packages/pip/_internal/utils/temp_dir.py", line 46, in global_tempdir_manager
    _tempdir_manager = old_tempdir_manager
  File "/env/lib/python2.7/site-packages/pip/_vendor/contextlib2.py", line 479, in __exit__
    _reraise_with_existing_context(exc_details)
  File "/env/lib/python2.7/site-packages/pip/_vendor/contextlib2.py", line 353, in _reraise_with_existing_context
    exec("raise exc_type, exc_value, exc_tb")
  File "/env/lib/python2.7/site-packages/pip/_vendor/contextlib2.py", line 468, in __exit__
    if cb(*exc_details):
  File "/env/lib/python2.7/site-packages/pip/_vendor/contextlib2.py", line 396, in _exit_wrapper
    return cm_exit(cm, *exc_details)
  File "/env/lib/python2.7/site-packages/pip/_internal/utils/temp_dir.py", line 175, in __exit__
    self.cleanup()
  File "/env/lib/python2.7/site-packages/pip/_internal/utils/temp_dir.py", line 199, in cleanup
    rmtree(ensure_text(self._path))
  File "/env/lib/python2.7/site-packages/pip/_vendor/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/env/lib/python2.7/site-packages/pip/_vendor/retrying.py", line 212, in call
    raise attempt.get()
  File "/env/lib/python2.7/site-packages/pip/_vendor/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/env/lib/python2.7/site-packages/pip/_vendor/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/env/lib/python2.7/site-packages/pip/_internal/utils/misc.py", line 139, in rmtree
    onerror=rmtree_errorhandler)
  File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib64/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib64/python2.7/shutil.py", line 241, in rmtree
    fullname = os.path.join(path, name)
  File "/env/lib64/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
The command '/bin/sh -c ./env/bin/pip install ansible==2.7.9' returned a non-zero code: 1
encoding crash python 2 only

Most helpful comment

I view this a bit differently. Since this was previously worked, we should try to keep it working until we formally drop Python 2 support. This is more about correcting our own mistakes than fixing bugs for the user.

What if I add a if PY2: condition on the fix to make it not apply to Python 3? The currently released code already works on Python 3, and does not work on Python 2, so the change would have no chance to make things worse if put behind that condition.

All 15 comments

What is the LANG environment variable in the container? I believe this is related to a change we made to always use Unicode for paths for Windows compatibility.

What is the LANG environment variable in the container? I believe this is related to a change we made to always use Unicode for paths for Windows compatibility.

$ locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

After making the following changes to the Dockerfile, it works.

 FROM centos:7

+ENV LANG en_US.utf8
+ENV LC_ALL en_US.utf8

 RUN yum -y install python-virtualenv
 RUN virtualenv --python=python2 env
 RUN ./env/bin/pip install -U pip==20.2 setuptools==44.1.1 wheel==0.34.2
 RUN ./env/bin/pip install ansible==2.7.9

If I understand correctly, this only affects Python 2. If this not the case, please do provide a reproduction case for a newer Python version. :)

If I understand correctly, this only affects Python 2.

That's correct. I couldn't reproduce the issue using Python 3.6:

 FROM centos:7

-RUN yum -y install python-virtualenv
+RUN yum -y install python-virtualenv python3
-RUN virtualenv --python=python2 env
+RUN virtualenv --python=python3 env
 RUN ./env/bin/pip install -U pip==20.2 setuptools==44.1.1 wheel==0.34.2
 RUN ./env/bin/pip install ansible==2.7.9

Python 3 always use the text type for paths and handle the encoding correctly. I think the correct fix would be to only call ensure_text on Windows.

For completeness, the change was introduced in #8223, which fixes encoding issues when the environment encoding is configured correctly.

I just realised #8655 would avoid this exception as well (by ignoring the cleanup error).

I just realised #8655 would avoid this exception as well (by ignoring the cleanup error).

Not a huge deal, but I guess that would generate a "Failed to clean up" log warning in this case?

Yes, hence I said “avoid” instead of “fix” 😉

FWIW, our documented Python 2 support policy is "bugs reported with pip which only occur on Python 2.7 would likely not be addressed directly by pip’s maintainers" (see https://pip.pypa.io/en/stable/development/release-process/#python-2-support).

This is Python 2 only bug with a relatively straightforward workaround so, well, I'm personally not going to be looking into this.

Personally, I'm happy with the accidental(?) solution caused by #8655.

This is Python 2 only bug with a relatively straightforward workaround so, well, I'm personally not going to be looking into this.

I can also buy closing this by referring to the workaround as an acceptable solution to this problem.

Another straight forward workaround is to just pin Pip to something like pip<20.2 if you're experiencing this (which is what we ended up doing as a quick fix). Especially since Pip will drop Python 2 support in a not too distant future anyway.

I view this a bit differently. Since this was previously worked, we should try to keep it working until we formally drop Python 2 support. This is more about correcting our own mistakes than fixing bugs for the user.

What if I add a if PY2: condition on the fix to make it not apply to Python 3? The currently released code already works on Python 3, and does not work on Python 2, so the change would have no chance to make things worse if put behind that condition.

Packages that are failing are those with non-ascii filenames inside. The ansible 2.7.9 has 3 such filenames:

$ cat Dockerfile 
FROM centos:7

RUN yum -y install python-virtualenv
RUN virtualenv --python=python2 env
RUN ./env/bin/pip install -U pip==20.2 setuptools==44.1.1 wheel==0.34.2
RUN ./env/bin/pip install ansible==2.7.9 || find / | egrep -v '^[a-zA-Z0-9,@+~: (){}\[_/\.-]+$'
$ sudo docker build -t pip-bug .
…
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
/tmp/pip-install-8JLJFl/ansible/test/integration/targets/template/templates/café.j2
/tmp/pip-install-8JLJFl/ansible/test/integration/targets/unarchive/files/test-unarchive-nonascii-くらとみ.tar.gz
/tmp/pip-install-8JLJFl/ansible/test/integration/targets/ansible/ansible-testé.cfg

It's not very reasonable to have such names in a package. But this is another issue.

I think the LANG variable workaround would fail if the package would contain filenames in different encoding than UTF-8.

I think pip should not assume that files on a filesystem have sensible names or UTF8 names - any stream of bytes from 1 to 255 is a valid filename on Unix systems.

Hi all, same issue

FROM ubuntu:20.04
RUN apt-get update -qqy \
  && apt-get -qqy --no-install-recommends install \
        python-dev python \
RUN curl https://bootstrap.pypa.io/get-pip.py --output get-pip.py \
    && python get-pip.py

We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

ipython 5.7.0 requires prompt-toolkit<2.0.0,>=1.0.4, but you'll have prompt-toolkit 2.0.3 which is incompatible.
Traceback (most recent call last):
  File "/usr/local/bin/pip", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/dist-packages/pip/_internal/cli/main.py", line 75, in main
    return command.main(cmd_args)
  File "/usr/local/lib/python2.7/dist-packages/pip/_internal/cli/base_command.py", line 121, in main
    return self._main(args)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/usr/local/lib/python2.7/dist-packages/pip/_internal/cli/command_context.py", line 28, in main_context
    yield
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/contextlib2.py", line 479, in __exit__
    _reraise_with_existing_context(exc_details)
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/contextlib2.py", line 353, in _reraise_with_existing_context
    exec("raise exc_type, exc_value, exc_tb")
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/contextlib2.py", line 468, in __exit__
    if cb(*exc_details):
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/contextlib2.py", line 396, in _exit_wrapper
    return cm_exit(cm, *exc_details)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/usr/local/lib/python2.7/dist-packages/pip/_internal/utils/temp_dir.py", line 46, in global_tempdir_manager
    _tempdir_manager = old_tempdir_manager
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/contextlib2.py", line 479, in __exit__
    _reraise_with_existing_context(exc_details)
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/contextlib2.py", line 353, in _reraise_with_existing_context
    exec("raise exc_type, exc_value, exc_tb")
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/contextlib2.py", line 468, in __exit__
    if cb(*exc_details):
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/contextlib2.py", line 396, in _exit_wrapper
    return cm_exit(cm, *exc_details)
  File "/usr/local/lib/python2.7/dist-packages/pip/_internal/utils/temp_dir.py", line 175, in __exit__
    self.cleanup()
  File "/usr/local/lib/python2.7/dist-packages/pip/_internal/utils/temp_dir.py", line 199, in cleanup
    rmtree(ensure_text(self._path))
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/retrying.py", line 212, in call
    raise attempt.get()
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/usr/local/lib/python2.7/dist-packages/pip/_internal/utils/misc.py", line 139, in rmtree
    onerror=rmtree_errorhandler)
  File "/usr/lib/python2.7/shutil.py", line 270, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.7/shutil.py", line 270, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.7/shutil.py", line 270, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.7/shutil.py", line 270, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.7/shutil.py", line 270, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.7/shutil.py", line 270, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.7/shutil.py", line 264, in rmtree
    fullname = os.path.join(path, name)
  File "/usr/lib/python2.7/posixpath.py", line 73, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

Was this page helpful?
0 / 5 - 0 ratings