pip install <package>
fails on Windows, if the projects description (e.g, its long description) is in utf-8.
(simpy) C:\Users\sscherfke\Code\simpy>pip install .
Unpacking c:\users\sscherfke\code\simpy
Running setup.py egg_info for package from file:///c%7C%5Cusers%5Csscherfke%5Ccode%5Csimpy
Cleaning up...
Exception:
Traceback (most recent call last):
File "C:\Users\sscherfke\Envs\simpy\lib\site-packages\pip\basecommand.py", line 134, in main
status = self.run(options, args)
File "C:\Users\sscherfke\Envs\simpy\lib\site-packages\pip\commands\install.py", line 236, in run
requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
File "C:\Users\sscherfke\Envs\simpy\lib\site-packages\pip\req.py", line 1134, in prepare_files
req_to_install.run_egg_info()
File "C:\Users\sscherfke\Envs\simpy\lib\site-packages\pip\req.py", line 264, in run_egg_info
"%(Name)s==%(Version)s" % self.pkg_info())
File "C:\Users\sscherfke\Envs\simpy\lib\site-packages\pip\req.py", line 357, in pkg_info
data = self.egg_info_data('PKG-INFO')
File "C:\Users\sscherfke\Envs\simpy\lib\site-packages\pip\req.py", line 297, in egg_info_data
data = fp.read()
File "C:\Users\sscherfke\Envs\simpy\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1235: character maps to <undefined>
Storing complete log in C:\Users\sscherfke\pip\pip.log
The problem seems to be, that req.egg_info_data()
(currently line 317 reads the egg-info created by python setup.py egg_info
with the system's default encoding, which is not utf-8 on Windows (but on most *nix systems).
With Python 3, it should be no problem if you use utf-8 in your README/CHANGES/AUTHORS.txt (or whatever), so pip should read files as unicode by default:
Changing lines 296 and 297 (in pip 1.4.1; 316 and 317 in the repo) to
fp = open(filename, 'rb')
data = fp.read().decode('utf-8')
fixes the problem for me.
The test setups was:
Can you create a pull request with a test case to ensure this bug won't regress?
@thedrow Can the PR be merged or is there something missing?
With Python 3, it should be no problem if you use utf-8 in your README/CHANGES/AUTHORS.txt (or whatever), so pip should read files as unicode by default:
What if those files are not in UTF-8? As far as I know, there's no requirement that they have to be.
The probability that they are is imho much higher then that they are in the windows default encoding, because utf-8 is the default on mac/linux and also in Python 3. You could also wrap it with a try/except block and use the default encoding if utf-8 should fail.
@sscherfke First of all, add tests to the pull request to ensure this bug won't regress.
And I was about to write that you should wrap it with a try..catch that uses the default encoding if it cannot decode it to UTF8 but you are ahead of me :)
@thedrow try default, fall back to utf-8 or orther way around?
It depends on which one is the most common use case. Ask the core developers.
The point is that the encoding of such files is not defined. So you can't reliably say that _any_ encoding will be correct (some, like latin1, will never give encoding errors, but that doesn't mean they are necessarily right).
Neither the egg-info nor the metadata specs make any mention of encodings, which is unfortunate, but reflects the fact that they were written when Python tended to assume that only ASCII would be used for anything that needed interoperability. Hence this bug.
The correct fix (Metadata 2.0) is of course to clearly specify encodings. But that's a way off yet, and in the meantime we have to be prepared to accept arbitrary data.
My view is that we should
Whether we use the platform encoding or UTF-8 doesn't really affect (1). In both cases, there is the possibility of invalid data. To address (1) we need to progressively fall back through a series of encodings, finishing with latin-1 (as that is the commonly used encoding that accepts all 256 byte values, and so will never error).
Using UTF-8 addresses (2), as UTF-8 is probably the most common encoding we will see (due to its prevalence on Unix systems).
For (3) it's really about how the data is used, and I think that's out of scope for this patch.
So if you add exception handling and a fallback - I'd suggest the platform default and then latin-1 in that order if UTF-8 fails - I think that would be a good solution.
Whether UTF-8 or platform default is the best choice for the initial attempt is something I doubt anyone can tell you. I suspect that a relatively small number of projects go outside ASCII anyway. For those that do, if you're on Unix UTF-8 _is_ the platform default so it makes no difference. On Windows, it boils down to which is the most important case - installing stuff developed by other Windows developers, or installing stuff developed by Unix developers. Honestly, that's going to be an almost totally arbitrary decision.
I think @pfmoore has a point. I completely agree.
try:
# Try utf-8
with open(filename, 'rb') as fp:
data = fp.read().decode('utf-8')
except UnicodeDecodeError:
try:
# Try the system’s default encoding
with open(filename, 'r') as fp:
data = fp.read()
except UnicodeDecodeError:
# Our last resort is latin1 which never throws an error
# (but returns nonsense instead :-))
with open(filename, 'rb') as fp:
data = fp.read().decode('latin1')
return data
This surely doesn’t look very friendly but shouldn’t raise any UnicodeDecodeError (as far as I’ve tested it).
What would be the preferred way for a pip testcase? To use actual files with varying encodings or to mock open() and pass varying bytes instead?
Created a new pull request #1331 which fixes the issue and looks a bit nicer then the snippet I posted above. :) Also added a new unit test.
C:\> pip --version
pip 1.5.4 from C:\Python27\lib\site-packages (Python 2.7)
C:\>pip install ipython
Downloading/unpacking ipython
Cleaning up...
Exception:
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\pip\basecommand.py", line 122, in main
status = self.run(options, args)
File "C:\Python27\lib\site-packages\pip\commands\install.py", line 278, in run
requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bun
dle)
File "C:\Python27\lib\site-packages\pip\req.py", line 1229, in prepare_files
req_to_install.run_egg_info()
File "C:\Python27\lib\site-packages\pip\req.py", line 292, in run_egg_info
logger.notify('Running setup.py (path:%s) egg_info for package %s' % (self.setup_py, s
elf.name))
File "C:\Python27\lib\site-packages\pip\req.py", line 265, in setup_py
import setuptools
File "C:\Python27\lib\site-packages\setuptools\__init__.py", line 12, in <module>
from setuptools.extension import Extension
File "C:\Python27\lib\site-packages\setuptools\extension.py", line 7, in <module>
from setuptools.dist import _get_unpatched
File "C:\Python27\lib\site-packages\setuptools\dist.py", line 15, in <module>
from setuptools.compat import numeric_types, basestring
File "C:\Python27\lib\site-packages\setuptools\compat.py", line 19, in <module>
from SimpleHTTPServer import SimpleHTTPRequestHandler
File "C:\Python27\lib\SimpleHTTPServer.py", line 27, in <module>
class SimpleHTTPRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
File "C:\Python27\lib\SimpleHTTPServer.py", line 208, in SimpleHTTPRequestHandler
mimetypes.init() # try to read system mime.types
File "C:\Python27\lib\mimetypes.py", line 358, in init
db.read_windows_registry()
File "C:\Python27\lib\mimetypes.py", line 258, in read_windows_registry
for subkeyname in enum_types(hkcr):
File "C:\Python27\lib\mimetypes.py", line 249, in enum_types
ctype = ctype.encode(default_encoding) # omit in 3.x!
UnicodeDecodeError: 'ascii' codec can't decode byte 0xca in position 9: ordinal not in ran
ge(128)
Traceback (most recent call last):
File "C:\Python27\lib\runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "C:\Python27\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "C:\Python27\Scripts\pip.exe\__main__.py", line 9, in <module>
File "C:\Python27\lib\site-packages\pip\__init__.py", line 185, in main
return command.main(cmd_args)
File "C:\Python27\lib\site-packages\pip\basecommand.py", line 161, in main
text = '\n'.join(complete_log)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc0 in position 68: ordinal not in ra
nge(128)
C:\>pip --help
Traceback (most recent call last):
File "C:\Python27\lib\runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "C:\Python27\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "C:\Python27\Scripts\pip.exe\__main__.py", line 9, in <module>
File "C:\Python27\lib\site-packages\pip\__init__.py", line 177, in main
cmd_name, cmd_args = parseopts(initial_args)
File "C:\Python27\lib\site-packages\pip\__init__.py", line 138, in parseopts
general_options, args_else = parser.parse_args(args)
File "C:\Python27\lib\optparse.py", line 1399, in parse_args
stop = self._process_args(largs, rargs, values)
File "C:\Python27\lib\optparse.py", line 1439, in _process_args
self._process_long_opt(rargs, values)
File "C:\Python27\lib\optparse.py", line 1514, in _process_long_opt
option.process(opt, value, values, self)
File "C:\Python27\lib\optparse.py", line 788, in process
self.action, self.dest, opt, value, values, parser)
File "C:\Python27\lib\optparse.py", line 810, in take_action
parser.print_help()
File "C:\Python27\lib\optparse.py", line 1669, in print_help
file.write(self.format_help().encode(encoding, "replace"))
File "C:\Python27\lib\encodings\cp866.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc0 in position 1257: ordinal not in
range(128)
C:\>pip
Traceback (most recent call last):
File "C:\Python27\lib\runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "C:\Python27\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "C:\Python27\Scripts\pip.exe\__main__.py", line 9, in <module>
File "C:\Python27\lib\site-packages\pip\__init__.py", line 177, in main
cmd_name, cmd_args = parseopts(initial_args)
File "C:\Python27\lib\site-packages\pip\__init__.py", line 148, in parseopts
parser.print_help()
File "C:\Python27\lib\optparse.py", line 1669, in print_help
file.write(self.format_help().encode(encoding, "replace"))
File "C:\Python27\lib\encodings\cp866.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc0 in position 1257: ordinal not in
range(128)
@while0pass I'm not sure if it's related. Try to uninstall pip and installl it again.
If the issue persists just open a new ticket.
Why is this issue still open when #1395 & #1396 were already merged? @dstufft
I get a error that is UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 33: ordinal not in range(128) in windows ,and it is cause by the registration table change of windows ,and the way to solve is opening the mimetypes.py in the C:\python27\Lib ,and found the code like ‘default_encoding = sys.getdefaultencoding()’,before it ,add the code below:
if sys.getdefaultencoding() != 'gbk':
reload(sys)
sys.setdefaultencoding('gbk')
default_encoding = sys.getdefaultencoding()
and then it is ok.
@BlakeWL
What is _registration table_?
@piotr-dobrogost
when you run regedit in windows,the registration table will show up.
It's called Windows Registry (http://en.wikipedia.org/wiki/Windows_Registry) not registration table.
can this patch work ?
--- C:/Python27/Lib/site-packages/pip/download.py ÖÜËÄ 5ÔÂ 28 09:40:28 2015
+++ C:/Python27/Lib/site-packages/pip/download.py ÖÜËÄ 5ÔÂ 28 11:34:40 2015
@@ -881,6 +881,13 @@ def _download_http_url(link, session, temp_dir):
ext = os.path.splitext(resp.url)[1]
if ext:
filename += ext
+ try:
+ if isinstance(temp_dir, unicode):
+ temp_dir = temp_dir.encode(sys.getfilesystemencoding())
+ if isinstance(filename, unicode):
+ filename = filename.encode(sys.getfilesystemencoding())
+ except NameError:
+ pass
file_path = os.path.join(temp_dir, filename)
with open(file_path, 'wb') as content_file:
_download_url(resp, link, content_file)
FYI, There is a Python standard for specifying the character encoding of a python module: PEP 263. Which you can read here: https://www.python.org/dev/peps/pep-0263/.
any word on this? just got this today
Closing this, I believe that this has been fixed.
Not for me....
Which lib should I have to upgrade i/o to get away from this problem?
Still happening. Is this related or different?:
e:\repos\fr>pip install -e .
Obtaining file:///E:/repos/fr
Installing collected packages: fr
Found existing installation: fr 3.0a0
Uninstalling fr-3.0a0:
Successfully uninstalled fr-3.0a0
Running setup.py develop for fr
Complete output from command c:\users\TheUser\python36\python.exe -c "import setuptools, tokenize;__file_
_='E:\\repos\\fr\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.
close();exec(compile(code, __file__, 'exec'))" develop --no-deps:
running develop
running egg_info
writing fr.egg-info\PKG-INFO
writing dependency_links to fr.egg-info\dependency_links.txt
writing requirements to fr.egg-info\requires.txt
writing top-level names to fr.egg-info\top_level.txt
reading manifest file 'fr.egg-info\SOURCES.txt'
writing manifest file 'fr.egg-info\SOURCES.txt'
running build_ext
Creating c:\users\TheUser\python36\lib\site-packages\fr.egg-link (link to .)
fr 3.0a0 is already the active version in easy-install.pth
c:\users\TheUser\python36\lib\site-packages\setuptools\dist.py:397: UserWarning: Normalizing '3.00a0' to
'3.0a0'
normalized_version,
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "E:\repos\fr\setup.py", line 62, in <module>
'Topic :: Utilities',
File "c:\users\TheUser\python36\lib\site-packages\setuptools\__init__.py", line 129, in setup
return distutils.core.setup(**attrs)
File "c:\users\TheUser\python36\lib\distutils\core.py", line 148, in setup
dist.run_commands()
File "c:\users\TheUser\python36\lib\distutils\dist.py", line 955, in run_commands
self.run_command(cmd)
File "c:\users\TheUser\python36\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "c:\users\TheUser\python36\lib\site-packages\setuptools\command\develop.py", line 36, in run
self.install_for_development()
File "c:\users\TheUser\python36\lib\site-packages\setuptools\command\develop.py", line 152, in install_
for_development
self.process_distribution(None, self.dist, not self.no_deps)
File "c:\users\TheUser\python36\lib\site-packages\setuptools\command\easy_install.py", line 726, in pro
cess_distribution
self.install_egg_scripts(dist)
File "c:\users\TheUser\python36\lib\site-packages\setuptools\command\develop.py", line 187, in install_
egg_scripts
script_text = strm.read()
File "c:\users\TheUser\python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1303: character maps to <undefined>
----------------------------------------
Rolling back uninstall of fr
Command "c:\users\TheUser\python36\python.exe -c "import setuptools, tokenize;__file__='E:\\repos\\fr\\setup.
py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(cod
e, __file__, 'exec'))" develop --no-deps" failed with error code 1 in E:\repos\fr\
That error seems to be coming from setuptools. Try python setup.ly develop
, if that errors out, it's a setuptools issue. If it doesn't, could you file a new bug report for it?
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
Not for me....
Which lib should I have to upgrade i/o to get away from this problem?