Openlibrary: Adding book page has problems with Unicode ÄÖÜäöü or ß

Created on 14 Aug 2020 · 12Comments · Source: internetarchive/openlibrary

Adding books containing a german ß in title oder ÄÖÜäöü or ß in the author name results in an error.

Evidence / Screenshot (if possible)

/opt/openlibrary/deploys/openlibrary/61096f2/openlibrary/templates/books/edit.html: error in processing template: UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 9: ordinal not in range(128) (falling back to default template)

Relevant url?

https://openlibrary.org/books/add

Steps to Reproduce

https://openlibrary.org/books/add
add title 'Der Große Weltatlas'

Details

Logged in: Y
Browser type/version? Firefox 79
Operating system? OSX
Environment (prod/dev/local)? prod

@hornc 2 Internationalization Bug

Source

bitnapper

All 12 comments

Trying to edit the book later, seems to produce the following error:

https://openlibrary.org/books/OL28738926M/Das_gro%C3%9Fe_Buch_der_Dinosaurier/edit

/opt/openlibrary/deploys/openlibrary/61096f2/openlibrary/templates/books/edit.html: error in processing template: UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 23: ordinal not in range(128) (falling back to default template)

bitnapper on 14 Aug 2020

This is a recent Variation on an old problem, also discussed en passant at #2231. As a temporary workaround, the problematic correct spelling can be moved to the author’s a.k.a.s and the primary author name respelled to use plain ASCII. This done, the editions and works become accessible.
@tabshaikh @cclauss Does this ring any bells?

LeadSongDog on 14 Aug 2020

@LeadSongDog This workaround helps with creating new books but OL28738926M can't be edited at all.

bitnapper on 14 Aug 2020

This kinda incompatibility should disappear in the coming months when we switch to Python 3 because all str are Unicode in current versions of Python. It would be cool if someone could write some Python test cases that fail for this issue so that once they pass, we know we have made useful progress. Even a list of URLs would help especially if we could get some of those books loaded into dev/staging.

cclauss on 14 Aug 2020

👍1

I found this bug in names with "á", "é", "í", "ó", and "ú" (letters with accents). Also with the letter "ñ" in Spanish. Example: Isaías Rojas Peña (including "í" and "ñ").

dcapillae on 23 Aug 2020

Can we replicate this on http://staging.openlibrary.org which is now running on Python 3?

cclauss on 23 Aug 2020

https://github.com/internetarchive/openlibrary/blob/master/openlibrary/plugins/upstream/addbook.py

This also presents a problem on Python 3.8.5 because we are attempting str.encode('ascii') so I will try to fix in a way that work for both...

Traceback (most recent call last):
  File "/home/openlibrary/.pyenv/versions/3.8.5/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 135, in handle
    self.handle_request(listener, req, client, addr)
  File "/home/openlibrary/.pyenv/versions/3.8.5/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 182, in handle_request
    resp.write(item)
  File "/home/openlibrary/.pyenv/versions/3.8.5/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 333, in write
    self.send_headers()
  File "/home/openlibrary/.pyenv/versions/3.8.5/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 329, in send_headers
    util.write(self.sock, util.to_bytestring(header_str, "ascii"))
  File "/home/openlibrary/.pyenv/versions/3.8.5/lib/python3.8/site-packages/gunicorn/util.py", line 507, in to_bytestring
    return value.encode(encoding)
UnicodeEncodeError: 'ascii' codec can't encode character '\xdf' in position 248: ordinal not in range(128)

cclauss on 25 Aug 2020

👍1

@cclauss I got an internal server error on staging.openlibrary.org adding that:
2B12909F-BB37-459E-80DE-1FAF654BF44E

Changing to grosse instead worked:
76F67567-E688-4F32-AAD8-D2B1972DA390

LeadSongDog on 25 Aug 2020

Correct... This is about Unicode characters vs. Ascii characters. Any Unicode character (even an emoji) will cause a problem on either Py2 or Py3.

cclauss on 25 Aug 2020

Hi, I'm getting a similar error when I try to edit this book:

https://openlibrary.org/books/OL9174734M/Biblioteca_Vasconcelos_Library

The error:

/opt/openlibrary/deploys/openlibrary/2b017b5/openlibrary/templates/books/edit.html: error in processing template: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 11: ordinal not in range(128) (falling back to default template)