poetry develop failing on non-ASCII characters

Created on 15 Jun 2018 · 25Comments · Source: python-poetry/poetry

authors = [
    "Sébastien Eustace <[email protected]>"
]

$ poetry develop -vvv

[AttributeError]
'NoneType' object has no attribute 'group'

authors = [
    "Sebastien Eustace <[email protected]>"
]

Installing dependencies from lock file

Nothing to install or update

Installing poetry (0.11.0-alpha.3)

As far as I know, the re library doesn't have any ability to support unicode character classes but regex can handle them properly.

I don't know if this has been brought up before or this is a windows-only thing, considering this happened while poetry developing poetry itself. as far as I checked, nobody has made an issue about this before.

Windows 10, python 3.6.4, poetry 0.11.0a3.

edit: #66 is similar.

In the meantime, catching errors:

    def _get_author(self):  # type: () -> dict
+       if self._authors:
+           m = AUTHOR_REGEX.match(self._authors[0])
+       else:
+           m = None

-       if not self._authors:
+       if not m:
+           # log.info('Could not find an author') or whatever
            return {"name": None, "email": None}

        m = AUTHOR_REGEX.match(self._authors[0])

        name = m.group("name")
        email = m.group("email")

        return {"name": name, "email": email}

Bug Setup

Source

xsduan

👍4

Most helpful comment

@jacebrowning Thanks for the pointer a few months back regarding AUTHOR_REGEX. After a bit of experimentation, I think that this has to do not with Poetry per se but rather with a bug in the re module (see https://github.com/lark-parser/lark/issues/590).

Replacing re with regex solves everything:

import re
AUTHOR_REGEX = re.compile(r"(?u)^(?P<name>[- .,\w\d'’\"()]+) <(?P<email>.+?)>$")
AUTHOR_REGEX.match("ம. ஆ. ஜூலீஎன் <[email protected]>")
>>> None
# But...
import regex as re
AUTHOR_REGEX = re.compile(r"(?u)^(?P<name>[- .,\w\d'’\"()]+) <(?P<email>.+?)>$")
AUTHOR_REGEX.match("ம. ஆ. ஜூலீஎன் <[email protected]>")
>>> <regex.Match object; span=(0, 44), match='ம. ஆ. ஜூலீஎன் <[email protected]>'>

So my question would now be - should I submit a pull request with import regex as re to Poetry? Or would adding a dependency risk breaking things?
Thanks!

julienmalard on 8 Nov 2020

👍2

All 25 comments

somehow the é is being encoded as iso latin-1 which is causing a unicode decode error.

    Complete output from command python setup.py egg_info:
    b"': 'S\xe9bast" # added print(data[24360:24370])
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "c:\users\droom\appdata\local\programs\python\python36\lib\codecs.py", line 331, in decode
        (result, consumed) = self._buffer_decode(data, self.errors, final)
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 24365: invalid continuation byte

xsduan on 15 Jul 2018

I remember having this problem, but I couldn't reproduce it now. On what version are you?

cauebs on 15 Jul 2018

[email protected], on Windows 10, ran poetry develop -vvv

xsduan on 16 Jul 2018

I only have this problem when the author name is retrieved from the Git config. If I set it manually by editing the pyproject.toml file directly or explicitely in the init command I don't have this issue.

sdispater on 27 Jul 2018

I've done poetry develop on Poetry itself on this Windows box before without any issues, but I'm only getting this problem now. I'm not sure why I didn't run into this before.

https://github.com/sdispater/poetry/blob/a1b97707e8b193c3b3a7ee47394c155f9e1eb0c0/poetry/console/commands/develop.py#L31-L32

Since the open() call that creates setup.py doesn't explicitly specify an encoding, it falls back to CP-1252 encoding on my Windows system.

This conflicts with the # -*- coding: utf-8 -*- encoding declaration in format string used to create setup.py:

https://github.com/sdispater/poetry/blob/a1b97707e8b193c3b3a7ee47394c155f9e1eb0c0/poetry/masonry/builders/sdist.py#L25-L43

The simplest solution here is to specify encoding="utf-8" in the open() call.

For demonstration, running poetry develop on Poetry itself works in #368. If this looks okay, I'll write a test. There might be other places in the project that need to have explicit encodings, though -- I'm willing to take a look at that.

While looking into this issue, I also found that the same thing happens to the readme parsing:
https://github.com/sdispater/poetry/blob/a1b97707e8b193c3b3a7ee47394c155f9e1eb0c0/poetry/masonry/metadata.py#L48-L50

Poetry's README.md is UTF-8 on my machine but gets decoded as CP-1252 in poetry develop, turning all the é's into Ã©, which in turn gets written out to the long_description field in setup.py. I'm not sure how this is supposed to be handled. Should it always assume that the readme file is UTF-8?

RaptDept on 3 Aug 2018

I think utf8 is a reasonable assumption.

To be safe it could always retry after guessing the encoding with something like chardet but I don't think that would be necessary, or at the very least just try CP-1252/ISO-8859 and then fail.

xsduan on 3 Aug 2018

As this issue is caused by setup.py being written in default encoding (and failing on systems, which have other than UTF-8 one), PR #1087 shall fix this issue (by being explicit about encoding when creating source-code like files)

vlcinsky on 8 May 2019

@xsduan Can you check, that the latest poetry 0.12.17 fixes the issue?

vlcinsky on 10 Jul 2019

poetry master = $ poetry --version
Poetry 0.12.17
# installation...
poetry master = $ cd d:\git\poetry
poetry master = $ poetry install
#...
poetry master = $ poetry run pip show poetry
Name: poetry
Version: 0.12.11
Summary: Python dependency management and packaging made easy.
Home-page: https://poetry.eustace.io/
Author: Sébastien Eustace
Author-email: [email protected]
License: UNKNOWN
Location: d:\git\poetry
Requires: cachecontrol, cachy, cleo, html5lib, jsonschema, pkginfo, pyparsing, pyrsistent, requests-toolbelt, requests, shellingham, tomlkit
Required-by:

looks like it

xsduan on 13 Jul 2019

@xsduan it looks like ... your edit left your comment incomplete.

vlcinsky on 13 Jul 2019

I can reproduce this error in poetry version 1.0.0 (Ubuntu 18.04)

>> poetry init --author "Alex Müller"
...
Package name [test]:  
Version [0.1.0]:  
Description []:  
[UnicodeDecodeError]
'ascii' codec can't decode byte 0xc3 in position 6: ordinal not in range(128)

It seams that "poetry init" can't handle the non ascii character in the author default in the dialogue. However it has no problem if the non ascii character is put after the prompt.

The following works fine

>> poetry init --author Alex
...
Package name [test]:  
Version [0.1.0]:  
Description []:  
Author [Alex, n to skip]:  Alex Müller
License []:

laxas on 19 Dec 2019

Thanks @laxas ,

with your example I was able to reproduce it with python 2.7. With python3 it works.

The problem seems to be well known, e.g. : https://stackoverflow.com/questions/21129020/how-to-fix-unicodedecodeerror-ascii-codec-cant-decode-byte

A simple fix would be changing this:

https://github.com/python-poetry/poetry/blob/affe32d8d41c76b5cb908fca492d36b6cebb0f76/poetry/console/commands/init.py#L81

into

name = self.option("name")

if isinstance(name, str):
    name = name.decode().encode("UTF-8")

But I guess, this is such a general problem and should be fixed in another place.

fin swimmer

finswimmer on 19 Dec 2019

I have the same problem as @laxan. Can't init new project because of non-ascii character in my git user name.

miedzinski on 8 Jan 2020

👍2

I have the same problem as @laxan. Can't init new project because of non-ascii character in my git user name.

The same here! My lastname has an ó

➜  rgh git:(develop) poetry  init -vvv

This command will guide you through creating your pyproject.toml config.

Package name [rgh]:
Version [0.1.0]:
Description []:
'ascii' codec can't encode character u'\xf3' in position 29: ordinal not in range(128)
'ascii' codec can't encode character u'\xf3' in position 29: ordinal not in range(128)
'ascii' codec can't encode character u'\xf3' in position 29: ordinal not in range(128)
...

Abuelodelanada on 30 Mar 2020

😕1 👍1

Passing the author to the cli with non-ascii characters also triggers the error:

❯ poetry init --author="an accent í is non-ascii"

This command will guide you through creating your pyproject.toml config.

Package name [non-ascii-test]:
Version [0.1.0]:
Description []:

[UnicodeDecodeError]
'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)

But if a non-ascii character is used when prompted to confirm the author, it does not fail:

❯ poetry init --author="ascii"

This command will guide you through creating your pyproject.toml config.

Package name [non-ascii-test]:
Version [0.1.0]:
Description []:
Author [ascii, n to skip]:  non-ascii í
License []:
...

jmfederico on 3 Apr 2020

I get the same on poetry build from non-ascii package names, for example:

[tool.poetry]
name = "lassi"
version = "0.1.0"
description = ""
authors = ["ਜ਼ੂਲੀਏਂ ਮਲਾਰ (Julien Malard) <[email protected]>"]
packages = [
    { include = "ਲੱਸੀ" }
]

This is on MacOS.
Edit: Unicode author name crashes as well.

julienmalard on 24 Apr 2020

👍1

Poetry 1.0.10 now displays a slightly more clear error message:

$ poetry build
Building lassi (0.1.0)

[ValueError]
Invalid author string. Must be in the format: John Smith <[email protected]>

using this minimal pyproject.toml:

[tool.poetry]
name = "lassi"
version = "0.1.0"
description = ""
authors = ["ਜ਼ੂਜ਼ੂਜ਼ੂ ਜ਼ੂਜ਼ੂਜ਼ੂ <[email protected]>"]

jacebrowning on 24 Aug 2020

@jacebrowning Thank you! If you could point me to the place where the AUTHOR_REGEX is defined (I can't find it!) I would be happy to contribute a pull request to help fix this. Perhaps a good approach would be to validate only what is in between <> tags, and allow the name to be anything? (Because some languages will use apostrophes, colons, combining makrs and other characters that re is likely to miss?)

julienmalard on 24 Aug 2020

@julienmalard It looks like AUTHOR_REGEX is now part of Poetry Core: https://github.com/python-poetry/poetry-core/search?q=AUTHOR_REGEX&unscoped_q=AUTHOR_REGEX

And imported here: https://github.com/python-poetry/poetry/blob/d2fd581c9a856a5c4e60a25acb95d06d2a963cf2/poetry/console/commands/init.py#L491-L498

jacebrowning on 24 Aug 2020

@jacebrowning Thank you! I had not noticed that poetry.core was not part of this repository.

julienmalard on 24 Aug 2020

I have just encountered this problem when I tried installing and running Poetry from my raw Docker container with Ubuntu Bionic.

The quick fix for me was to do:

LC_ALL=C.UTF-8 poetry

A more permanent solution for my Docker container I have found here:

sudo apt-get -y install language-pack-en
The following extra packages will be installed:
  language-pack-en-base
Generating locales...
  en_GB.UTF-8... /usr/sbin/locale-gen: done
Generation complete.

stanislaw on 8 Nov 2020

@stanislaw what version of poetry do you have?

What you describe is a workaround.

poetry shall not depend on current settings for locales etc., it shall explicitly work with utf-8. If this is still not true, the fix (in poetry code) is to explicitly specify encoding utf-8 with all file open operations.

vlcinsky on 8 Nov 2020

@vlcinsky sure I understand. I just needed to get something done really quickly.

root@95eea793181d:/app# poetry --version
Poetry version 1.1.4

The full output:

root@95eea793181d:/app# poetry
Poetry version 1.1.4

USAGE

  UnicodeEncodeError

  'ascii' codec can't encode character '\xa0' in position 30: ordinal not in range(128)

  at ~/.poetry/lib/poetry/_vendor/py3.6/clikit/io/output_stream/stream_output_stream.py:24 in write
Traceback (most recent call last):
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/console_application.py", line 131, in run
    status_code = command.handle(parsed_args, io)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/command/command.py", line 120, in handle
    status_code = self._do_handle(args, io)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/command/command.py", line 171, in _do_handle
    return getattr(handler, handler_method)(args, io, self)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/handler/help/help_text_handler.py", line 29, in handle
    usage.render(io)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/help/abstract_help.py", line 31, in render
    layout.render(io, indentation)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/layout/block_layout.py", line 42, in render
    element.render(io, self._indentations[i] + indentation)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/components/labeled_paragraph.py", line 70, in render
    + "\n"
  File "/root/.poetry/lib/poetry/_vendor/py3.6/cleo/io/io_mixin.py", line 55, in write
    super(IOMixin, self).write(string, flags)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/io/io.py", line 58, in write
    self._output.write(string, flags=flags)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/io/output.py", line 61, in write
    self._stream.write(to_str(formatted))
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/io/output_stream/stream_output_stream.py", line 24, in write
    self._stream.write(string)
UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 30: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.poetry/bin/poetry", line 19, in <module>
    main()
  File "/root/.poetry/lib/poetry/console/__init__.py", line 5, in main
    return Application().run()
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/console_application.py", line 142, in run
    trace.render(io, simple=isinstance(e, CliKitException))
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/components/exception_trace.py", line 232, in render
    return self._render_exception(io, self._exception)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/components/exception_trace.py", line 269, in _render_exception
    self._render_snippet(io, current_frame)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/components/exception_trace.py", line 289, in _render_snippet
    self._render_line(io, code_line)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/ui/components/exception_trace.py", line 402, in _render_line
    io.write_line("{}{}".format(indent * " ", line))
  File "/root/.poetry/lib/poetry/_vendor/py3.6/cleo/io/io_mixin.py", line 65, in write_line
    super(IOMixin, self).write_line(string, flags)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/io/io.py", line 66, in write_line
    self._output.write_line(string, flags=flags)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/io/output.py", line 69, in write_line
    self.write(string, flags=flags, new_line=True)
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/api/io/output.py", line 61, in write
    self._stream.write(to_str(formatted))
  File "/root/.poetry/lib/poetry/_vendor/py3.6/clikit/io/output_stream/stream_output_stream.py", line 24, in write
    self._stream.write(string)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2502' in position 27: ordinal not in range(128)

I have installed it like this:

root@95eea793181d:/app#  curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
Retrieving Poetry metadata

# Welcome to Poetry!

This will download and install the latest version of Poetry,
a dependency and package manager for Python.

It will add the `poetry` command to Poetry's bin directory, located at:

$HOME/.poetry/bin

This path will then be added to your `PATH` environment variable by
modifying the profile file located at:

$HOME/.profile

You can uninstall at any time by executing this script with the --uninstall option,
and these changes will be reverted.

Installing version: 1.1.4
  - Downloading poetry-1.1.4-linux.tar.gz (57.03MB)

Poetry (1.1.4) is installed now. Great!

To get started you need Poetry's bin directory ($HOME/.poetry/bin) in your `PATH`
environment variable. Next time you log in this will be done
automatically.

To configure your current shell run `source $HOME/.poetry/env`

stanislaw on 8 Nov 2020

Thanks @stanislaw for detailed report.

My note was about robustness of poetry. Workarounds are very practical and often life savers.

I have met dependency of setuptools on current system locale which is definitely wrong (I think my fix is already in). Solution is searching through the project for all open calls and making sure, that if opening a stream in text mode, they do explicitly state the encoding "utf-8".

vlcinsky on 8 Nov 2020

Replacing re with regex solves everything:

import re
AUTHOR_REGEX = re.compile(r"(?u)^(?P<name>[- .,\w\d'’\"()]+) <(?P<email>.+?)>$")
AUTHOR_REGEX.match("ம. ஆ. ஜூலீஎன் <[email protected]>")
>>> None
# But...
import regex as re
AUTHOR_REGEX = re.compile(r"(?u)^(?P<name>[- .,\w\d'’\"()]+) <(?P<email>.+?)>$")
AUTHOR_REGEX.match("ம. ஆ. ஜூலீஎன் <[email protected]>")
>>> <regex.Match object; span=(0, 44), match='ம. ஆ. ஜூலீஎன் <[email protected]>'>

So my question would now be - should I submit a pull request with import regex as re to Poetry? Or would adding a dependency risk breaking things?
Thanks!

julienmalard on 8 Nov 2020

👍2

Was this page helpful?

0 / 5 - 0 ratings