there's a proposal to deprecate not passing an encoding to open(...)
https://www.python.org/dev/peps/pep-0597/#motivation
Developers using macOS or Linux may forget that the default encoding is not always UTF-8.
For example, long_description = open("README.md").read() in setup.py is a common mistake. Many Windows users can not install the package if there is at least one non-ASCII character (e.g. emoji) in the README.md file which is encoded in UTF-8.
For example, 489 packages of the 4000 most downloaded packages from PyPI used non-ASCII characters in README. And 82 packages of them can not be installed from source package when locale encoding is ASCII. [1] They used the default encoding to read README or TOML file.
Another example is logging.basicConfig(filename="log.txt"). Some users expect UTF-8 is used by default, but locale encoding is used actually. [2]
Even Python experts assume that default encoding is UTF-8. It creates bugs that happen only on Windows. See [3] and [4].
Raising a warning when the encoding option is omitted will help to find such mistakes.
raise a warning similar to subprocess-run-check
Add any other context about the feature request here.
# bad
with open(filename) as f:
...
# bad
with open(filename, encoding=None) as f:
...
# good
with open(filename, encoding="utf8", errors="surrogateescape") as f:
...
# good
locale_encoding = getattr(io, "LOCALE_ENCODING", None)
with open(filename, encoding=locale_encoding) as f:
...
@graingert thanks!
It makes sense.
@hippo91 can you allocate a pylint feature id for me?
@graingert sorry for the delay. What do you mean by feature id?
I think I mean a message id? http://pylint.pycqa.org/en/latest/technical_reference/features.html
@graingert i think a new message id is necessary for this case. Something around missing_open_encoding.
Maybe @Pierre-Sassoulas @AWhetter or @PCManticore will have a better idea?
subprocess-run-check should be indeed a good starting point.
Note that it's not just open, eg os.fdopen
I really like this one, I had the problem multiple time for windows/mac users, that would be really helpful. I'd also use an error code for that one. Regarding the message id, what about unspecified-encoding ?
@Pierre-Sassoulas let's go for unspecified-encoding.
Most helpful comment
@Pierre-Sassoulas let's go for
unspecified-encoding.