In jaraco/configparser#34, I learned that although setuptools v40.7.0 presumably added support for non-ASCII, there are still environments where loading non-ASCII is failing.
configparser # easy_install --version
setuptools 40.8.0 from c:\python37\lib\site-packages (Python 3.7)
configparser 3.7.2 # python setup.py egg_info
Traceback (most recent call last):
File "setup.py", line 5, in <module>
package_dir={'': 'src'},
File "C:\Python37\lib\site-packages\setuptools\__init__.py", line 144, in setup
_install_setup_requires(attrs)
File "C:\Python37\lib\site-packages\setuptools\__init__.py", line 137, in _install_setup_requires
dist.parse_config_files(ignore_option_errors=True)
File "C:\Python37\lib\site-packages\setuptools\dist.py", line 702, in parse_config_files
self._parse_config_files(filenames=filenames)
File "C:\Python37\lib\site-packages\setuptools\dist.py", line 599, in _parse_config_files
(parser.read_file if six.PY3 else parser.readfp)(reader)
File "C:\Python37\lib\configparser.py", line 717, in read_file
self._read(f, source)
File "C:\Python37\lib\configparser.py", line 1014, in _read
for lineno, line in enumerate(fp, start=1):
File "C:\Python37\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 103: character maps to <undefined>
Hmm. On further investigation, I discovered that encoding detection is employed, meaning that adding # coding: utf-8 to the file corrected the issue.
In this comment, the user suggests the default encoding should be UTF-8. I agree and in fact I'd like to work toward the only encoding to be UTF-8 (drop support for encoding detection).
I see that the encoding detection was added in #1180. @benoit-pierre, do you recall why you chose to have the encoding declared rather than demanding and relying on UTF-8?
Backward compatibility: to no break some existing workflows (using an encoding other than UTF-8 with a corresponding locale).
My suggestion was to at least reduce the prevalence of this problem, but detecting intentionally non-user locales like "POSIX" or "C", and upgrade to UTF-8 for those.
Requiring people to add a coding declaration to non-coding files in every PyPI package containing a non-ascii property, such as author name, is horribly unnecessary churn.
If a user has a specific locale set, that can be a problem solved another day. But again, if the setup.cfg cant be decoded with the user locale, the fallback should be to attempt to decode it with the best guess of the authors locale - utf-8 , especially for any package which has been downloaded from pypi which strongly implies the users locale is irrelevant as the file is not written by the user, but by an author on the other side of the world.
Also related, I believe pytest was reading it was ascii, but now defaults to utf-8. It is causing breakages in pytest 3.3.2, at least, but not in current pytest.
I am strongly inclined to _assume_ UTF-8 (which also supports ASCII). I'm also inclined to remove support for the coding unless there's a strong use-case for it.
@jayvdb Would you be willing to put together a PR?
Hi @jaraco , I have a related issue.
setup.cfg contains Unicode symbols and set explicit UTF-8 encoding:
"# -*- coding: utf-8 -*-
[metadata]
...
When I run tox to test itself under Python2:
Processing ./.tox/.tmp/package/2/tox-3.8.0.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/src/tmp/pip-req-build-jUmuTe/setup.py", line 18, in <module>
package_dir={"": "src"},
File "/usr/src/RPM/BUILD/python-module-tox-3.8.0/.tox/py27/lib/python2.7/site-packages/setuptools/__init__.py", line 144, in setup
_install_setup_requires(attrs)
File "/usr/src/RPM/BUILD/python-module-tox-3.8.0/.tox/py27/lib/python2.7/site-packages/setuptools/__init__.py", line 137, in _install_setup_requires
dist.parse_config_files(ignore_option_errors=True)
File "/usr/src/RPM/BUILD/python-module-tox-3.8.0/.tox/py27/lib/python2.7/site-packages/setuptools/dist.py", line 702, in parse_config_files
self._parse_config_files(filenames=filenames)
File "/usr/src/RPM/BUILD/python-module-tox-3.8.0/.tox/py27/lib/python2.7/site-packages/setuptools/dist.py", line 599, in _parse_config_files
(parser.read_file if six.PY3 else parser.readfp)(reader)
File "/usr/lib64/python2.7/ConfigParser.py", line 324, in readfp
self._read(fp, filename)
File "/usr/lib64/python2.7/ConfigParser.py", line 479, in _read
line = fp.readline()
File "/usr/src/RPM/BUILD/python-module-tox-3.8.0/.tox/py27/lib64/python2.7/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 345: ordinal not in range(128)
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /usr/src/tmp/pip-req-build-jUmuTe/
This is because "edit_config" doesn't pass down the original encoding:
(Pdb) bt
/usr/src/RPM/BUILD/python-module-tox-3.8.0/setup.py(18)<module>()
-> package_dir={"": "src"},
/usr/lib/python2.7/site-packages/setuptools/__init__.py(145)setup()
-> return distutils.core.setup(**attrs)
/usr/lib64/python2.7/distutils/core.py(151)setup()
-> dist.run_commands()
/usr/lib64/python2.7/distutils/dist.py(953)run_commands()
-> self.run_command(cmd)
/usr/lib64/python2.7/distutils/dist.py(972)run_command()
-> cmd_obj.run()
/usr/lib/python2.7/site-packages/setuptools/command/sdist.py(54)run()
-> self.make_distribution()
/usr/lib/python2.7/site-packages/setuptools/command/sdist.py(78)make_distribution()
-> orig.sdist.make_distribution(self)
/usr/lib64/python2.7/distutils/command/sdist.py(456)make_distribution()
-> self.make_release_tree(base_dir, self.filelist.files)
/usr/lib/python2.7/site-packages/setuptools/command/sdist.py(168)make_release_tree()
-> self.get_finalized_command('egg_info').save_version_info(dest)
/usr/lib/python2.7/site-packages/setuptools/command/egg_info.py(191)save_version_info()
-> edit_config(filename, dict(egg_info=egg_info))
> /usr/lib/python2.7/site-packages/setuptools/command/setopt.py(74)edit_config()
-> opts.write(f)
(Pdb)
The output is something like:
[metadata]
name = tox
locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
Given that setopt and its edit_config function need to write to the config file, I'm even more strongly inclined now to remove support for specifying an encoding in setup.cfg files and instead insist on UTF-8, especially since commands like bdist_rpm invoke egg_info which in turn rewrites the config file.