authors = [
"Sébastien Eustace <[email protected]>"
]
$ poetry develop -vvv
[AttributeError]
'NoneType' object has no attribute 'group'
authors = [
"Sebastien Eustace <[email protected]>"
]
Installing dependencies from lock file
Nothing to install or update
Installing poetry (0.11.0-alpha.3)
As far as I know, the re library doesn't have any ability to support unicode character classes but regex can handle them properly.
I don't know if this has been brought up before or this is a windows-only thing, considering this happened while poetry developing poetry itself. as far as I checked, nobody has made an issue about this before.
Windows 10, python 3.6.4, poetry 0.11.0a3.
edit: #66 is similar.
In the meantime, catching errors:
def _get_author(self): # type: () -> dict
+ if self._authors:
+ m = AUTHOR_REGEX.match(self._authors[0])
+ else:
+ m = None
- if not self._authors:
+ if not m:
+ # log.info('Could not find an author') or whatever
return {"name": None, "email": None}
m = AUTHOR_REGEX.match(self._authors[0])
name = m.group("name")
email = m.group("email")
return {"name": name, "email": email}
somehow the é is being encoded as iso latin-1 which is causing a unicode decode error.
Complete output from command python setup.py egg_info:
b"': 'S\xe9bast" # added print(data[24360:24370])
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "c:\users\droom\appdata\local\programs\python\python36\lib\codecs.py", line 331, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 24365: invalid continuation byte
I remember having this problem, but I couldn't reproduce it now. On what version are you?
[email protected], on Windows 10, ran poetry develop -vvv
I only have this problem when the author name is retrieved from the Git config. If I set it manually by editing the pyproject.toml file directly or explicitely in the init command I don't have this issue.
I've done poetry develop on Poetry itself on this Windows box before without any issues, but I'm only getting this problem now. I'm not sure why I didn't run into this before.
Since the open() call that creates setup.py doesn't explicitly specify an encoding, it falls back to CP-1252 encoding on my Windows system.
This conflicts with the # -*- coding: utf-8 -*- encoding declaration in format string used to create setup.py:
The simplest solution here is to specify encoding="utf-8" in the open() call.
For demonstration, running poetry develop on Poetry itself works in #368. If this looks okay, I'll write a test. There might be other places in the project that need to have explicit encodings, though -- I'm willing to take a look at that.
While looking into this issue, I also found that the same thing happens to the readme parsing:
https://github.com/sdispater/poetry/blob/a1b97707e8b193c3b3a7ee47394c155f9e1eb0c0/poetry/masonry/metadata.py#L48-L50
Poetry's README.md is UTF-8 on my machine but gets decoded as CP-1252 in poetry develop, turning all the é's into é, which in turn gets written out to the long_description field in setup.py. I'm not sure how this is supposed to be handled. Should it always assume that the readme file is UTF-8?
I think utf8 is a reasonable assumption.
To be safe it could always retry after guessing the encoding with something like chardet but I don't think that would be necessary, or at the very least just try CP-1252/ISO-8859 and then fail.
As this issue is caused by setup.py being written in default encoding (and failing on systems, which have other than UTF-8 one), PR #1087 shall fix this issue (by being explicit about encoding when creating source-code like files)
@xsduan Can you check, that the latest poetry 0.12.17 fixes the issue?
poetry master = $ poetry --version
Poetry 0.12.17
# installation...
poetry master = $ cd d:\git\poetry
poetry master = $ poetry install
#...
poetry master = $ poetry run pip show poetry
Name: poetry
Version: 0.12.11
Summary: Python dependency management and packaging made easy.
Home-page: https://poetry.eustace.io/
Author: Sébastien Eustace
Author-email: [email protected]
License: UNKNOWN
Location: d:\git\poetry
Requires: cachecontrol, cachy, cleo, html5lib, jsonschema, pkginfo, pyparsing, pyrsistent, requests-toolbelt, requests, shellingham, tomlkit
Required-by:
looks like it
@xsduan it looks like ... your edit left your comment incomplete.
I can reproduce this error in poetry version 1.0.0 (Ubuntu 18.04)
>> poetry init --author "Alex Müller"
...
Package name [test]:
Version [0.1.0]:
Description []:
[UnicodeDecodeError]
'ascii' codec can't decode byte 0xc3 in position 6: ordinal not in range(128)
It seams that "poetry init" can't handle the non ascii character in the author default in the dialogue. However it has no problem if the non ascii character is put after the prompt.
The following works fine
>> poetry init --author Alex
...
Package name [test]:
Version [0.1.0]:
Description []:
Author [Alex, n to skip]: Alex Müller
License []:
Thanks @laxas ,
with your example I was able to reproduce it with python 2.7. With python3 it works.
The problem seems to be well known, e.g. : https://stackoverflow.com/questions/21129020/how-to-fix-unicodedecodeerror-ascii-codec-cant-decode-byte
A simple fix would be changing this:
into
name = self.option("name")
if isinstance(name, str):
name = name.decode().encode("UTF-8")
But I guess, this is such a general problem and should be fixed in another place.
fin swimmer
I have the same problem as @laxan. Can't init new project because of non-ascii character in my git user name.
The same here! My lastname has an ó
➜ rgh git:(develop) poetry init -vvv
This command will guide you through creating your pyproject.toml config.
Package name [rgh]:
Version [0.1.0]:
Description []:
'ascii' codec can't encode character u'\xf3' in position 29: ordinal not in range(128)
'ascii' codec can't encode character u'\xf3' in position 29: ordinal not in range(128)
'ascii' codec can't encode character u'\xf3' in position 29: ordinal not in range(128)
...
Passing the author to the cli with non-ascii characters also triggers the error:
❯ poetry init --author="an accent í is non-ascii"
This command will guide you through creating your pyproject.toml config.
Package name [non-ascii-test]:
Version [0.1.0]:
Description []:
[UnicodeDecodeError]
'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)
But if a non-ascii character is used when prompted to confirm the author, it does not fail:
❯ poetry init --author="ascii"
This command will guide you through creating your pyproject.toml config.
Package name [non-ascii-test]:
Version [0.1.0]:
Description []:
Author [ascii, n to skip]: non-ascii í
License []:
...
I get the same on poetry build from non-ascii package names, for example:
[tool.poetry]
name = "lassi"
version = "0.1.0"
description = ""
authors = ["ਜ਼ੂਲੀਏਂ ਮਲਾਰ (Julien Malard) <[email protected]>"]
packages = [
{ include = "ਲੱਸੀ" }
]
This is on MacOS.
Edit: Unicode author name crashes as well.
Poetry 1.0.10 now displays a slightly more clear error message:
$ poetry build
Building lassi (0.1.0)
[ValueError]
Invalid author string. Must be in the format: John Smith <[email protected]>
using this minimal pyproject.toml:
[tool.poetry]
name = "lassi"
version = "0.1.0"
description = ""
authors = ["ਜ਼ੂਜ਼ੂਜ਼ੂ ਜ਼ੂਜ਼ੂਜ਼ੂ <[email protected]>"]
@jacebrowning Thank you! If you could point me to the place where the AUTHOR_REGEX is defined (I can't find it!) I would be happy to contribute a pull request to help fix this. Perhaps a good approach would be to validate only what is in between <> tags, and allow the name to be anything? (Because some languages will use apostrophes, colons, combining makrs and other characters that re is likely to miss?)
@julienmalard It looks like AUTHOR_REGEX is now part of Poetry Core: https://github.com/python-poetry/poetry-core/search?q=AUTHOR_REGEX&unscoped_q=AUTHOR_REGEX
And imported here: https://github.com/python-poetry/poetry/blob/d2fd581c9a856a5c4e60a25acb95d06d2a963cf2/poetry/console/commands/init.py#L491-L498
@jacebrowning Thank you! I had not noticed that poetry.core was not part of this repository.
I have just encountered this problem when I tried installing and running Poetry from my raw Docker container with Ubuntu Bionic.
The quick fix for me was to do:
LC_ALL=C.UTF-8 poetry
A more permanent solution for my Docker container I have found here:
sudo apt-get -y install language-pack-en
The following extra packages will be installed:
language-pack-en-base
Generating locales...
en_GB.UTF-8... /usr/sbin/locale-gen: done
Generation complete.
@stanislaw what version of poetry do you have?
What you describe is a workaround.
poetry shall not depend on current settings for locales etc., it shall explicitly work with utf-8. If this is still not true, the fix (in poetry code) is to explicitly specify encoding utf-8 with all file open operations.
@vlcinsky sure I understand. I just needed to get something done really quickly.
root@95eea793181d:/app# poetry --version
Poetry version 1.1.4
The full output:
root@95eea793181d:/app# poetry
Poetry version 1.1.4
USAGE
UnicodeEncodeError
'ascii' codec can't encode character '\xa0' in position 30: ordinal not in range(128)
at ~/.poetry/lib/poetry/_vendor/py3.6/clikit/io/output_stream/stream_output_stream.py:24 in write
Traceback (most recent call last):
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/console_application.py", line 131, in run
status_code = command.handle(parsed_args, io)
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/command/command.py", line 120, in handle
status_code = self._do_handle(args, io)
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/command/command.py", line 171, in _do_handle
return getattr(handler, handler_method)(args, io, self)
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/handler/help/help_text_handler.py", line 29, in handle
usage.render(io)
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/help/abstract_help.py", line 31, in render
layout.render(io, indentation)
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/layout/block_layout.py", line 42, in render
element.render(io, self._indentations[i] + indentation)
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/components/labeled_paragraph.py", line 70, in render
+ "\n"
File "/root/.poetry/lib/poetry/_vendor/py3.6/cleo/io/io_mixin.py", line 55, in write
super(IOMixin, self).write(string, flags)
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/io/io.py", line 58, in write
self._output.write(string, flags=flags)
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/io/output.py", line 61, in write
self._stream.write(to_str(formatted))
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/io/output_stream/stream_output_stream.py", line 24, in write
self._stream.write(string)
UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 30: ordinal not in range(128)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/.poetry/bin/poetry", line 19, in <module>
main()
File "/root/.poetry/lib/poetry/console/__init__.py", line 5, in main
return Application().run()
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/console_application.py", line 142, in run
trace.render(io, simple=isinstance(e, CliKitException))
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/components/exception_trace.py", line 232, in render
return self._render_exception(io, self._exception)
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/components/exception_trace.py", line 269, in _render_exception
self._render_snippet(io, current_frame)
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/components/exception_trace.py", line 289, in _render_snippet
self._render_line(io, code_line)
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/components/exception_trace.py", line 402, in _render_line
io.write_line("{}{}".format(indent * " ", line))
File "/root/.poetry/lib/poetry/_vendor/py3.6/cleo/io/io_mixin.py", line 65, in write_line
super(IOMixin, self).write_line(string, flags)
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/io/io.py", line 66, in write_line
self._output.write_line(string, flags=flags)
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/io/output.py", line 69, in write_line
self.write(string, flags=flags, new_line=True)
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/io/output.py", line 61, in write
self._stream.write(to_str(formatted))
File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/io/output_stream/stream_output_stream.py", line 24, in write
self._stream.write(string)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2502' in position 27: ordinal not in range(128)
I have installed it like this:
root@95eea793181d:/app# curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
Retrieving Poetry metadata
# Welcome to Poetry!
This will download and install the latest version of Poetry,
a dependency and package manager for Python.
It will add the `poetry` command to Poetry's bin directory, located at:
$HOME/.poetry/bin
This path will then be added to your `PATH` environment variable by
modifying the profile file located at:
$HOME/.profile
You can uninstall at any time by executing this script with the --uninstall option,
and these changes will be reverted.
Installing version: 1.1.4
- Downloading poetry-1.1.4-linux.tar.gz (57.03MB)
Poetry (1.1.4) is installed now. Great!
To get started you need Poetry's bin directory ($HOME/.poetry/bin) in your `PATH`
environment variable. Next time you log in this will be done
automatically.
To configure your current shell run `source $HOME/.poetry/env`
Thanks @stanislaw for detailed report.
My note was about robustness of poetry. Workarounds are very practical and often life savers.
I have met dependency of setuptools on current system locale which is definitely wrong (I think my fix is already in). Solution is searching through the project for all open calls and making sure, that if opening a stream in text mode, they do explicitly state the encoding "utf-8".
@jacebrowning Thanks for the pointer a few months back regarding AUTHOR_REGEX. After a bit of experimentation, I think that this has to do not with Poetry per se but rather with a bug in the re module (see https://github.com/lark-parser/lark/issues/590).
Replacing re with regex solves everything:
import re
AUTHOR_REGEX = re.compile(r"(?u)^(?P<name>[- .,\w\d'’\"()]+) <(?P<email>.+?)>$")
AUTHOR_REGEX.match("ம. ஆ. ஜூலீஎன் <[email protected]>")
>>> None
# But...
import regex as re
AUTHOR_REGEX = re.compile(r"(?u)^(?P<name>[- .,\w\d'’\"()]+) <(?P<email>.+?)>$")
AUTHOR_REGEX.match("ம. ஆ. ஜூலீஎன் <[email protected]>")
>>> <regex.Match object; span=(0, 44), match='ம. ஆ. ஜூலீஎன் <[email protected]>'>
So my question would now be - should I submit a pull request with import regex as re to Poetry? Or would adding a dependency risk breaking things?
Thanks!
Most helpful comment
@jacebrowning Thanks for the pointer a few months back regarding
AUTHOR_REGEX. After a bit of experimentation, I think that this has to do not with Poetry per se but rather with a bug in theremodule (see https://github.com/lark-parser/lark/issues/590).Replacing
rewithregexsolves everything:So my question would now be - should I submit a pull request with
import regex as reto Poetry? Or would adding a dependency risk breaking things?Thanks!