Openlibrary: Adding book page has problems with Unicode ÄÖÜäöü or ß

Created on 14 Aug 2020  Â·  12Comments  Â·  Source: internetarchive/openlibrary

Adding books containing a german ß in title oder ÄÖÜäöü or ß in the author name results in an error.

Evidence / Screenshot (if possible)

/opt/openlibrary/deploys/openlibrary/61096f2/openlibrary/templates/books/edit.html: error in processing template: UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 9: ordinal not in range(128) (falling back to default template)

Relevant url?

https://openlibrary.org/books/add

Steps to Reproduce

  1. https://openlibrary.org/books/add
  2. add title 'Der Große Weltatlas'

Details

  • Logged in: Y
  • Browser type/version? Firefox 79
  • Operating system? OSX
  • Environment (prod/dev/local)? prod
@hornc 2 Internationalization Bug

All 12 comments

Trying to edit the book later, seems to produce the following error:

https://openlibrary.org/books/OL28738926M/Das_gro%C3%9Fe_Buch_der_Dinosaurier/edit

/opt/openlibrary/deploys/openlibrary/61096f2/openlibrary/templates/books/edit.html: error in processing template: UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 23: ordinal not in range(128) (falling back to default template)

This is a recent Variation on an old problem, also discussed en passant at #2231. As a temporary workaround, the problematic correct spelling can be moved to the author’s a.k.a.s and the primary author name respelled to use plain ASCII. This done, the editions and works become accessible.
@tabshaikh @cclauss Does this ring any bells?

@LeadSongDog This workaround helps with creating new books but OL28738926M can't be edited at all.

This kinda incompatibility should disappear in the coming months when we switch to Python 3 because all str are Unicode in current versions of Python. It would be cool if someone could write some Python test cases that fail for this issue so that once they pass, we know we have made useful progress. Even a list of URLs would help especially if we could get some of those books loaded into dev/staging.

I found this bug in names with "á", "é", "í", "ó", and "ú" (letters with accents). Also with the letter "ñ" in Spanish. Example: Isaías Rojas Peña (including "í" and "ñ").

Can we replicate this on http://staging.openlibrary.org which is now running on Python 3?

https://github.com/internetarchive/openlibrary/blob/master/openlibrary/plugins/upstream/addbook.py

This also presents a problem on Python 3.8.5 because we are attempting str.encode('ascii') so I will try to fix in a way that work for both...

Traceback (most recent call last):
  File "/home/openlibrary/.pyenv/versions/3.8.5/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 135, in handle
    self.handle_request(listener, req, client, addr)
  File "/home/openlibrary/.pyenv/versions/3.8.5/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 182, in handle_request
    resp.write(item)
  File "/home/openlibrary/.pyenv/versions/3.8.5/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 333, in write
    self.send_headers()
  File "/home/openlibrary/.pyenv/versions/3.8.5/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 329, in send_headers
    util.write(self.sock, util.to_bytestring(header_str, "ascii"))
  File "/home/openlibrary/.pyenv/versions/3.8.5/lib/python3.8/site-packages/gunicorn/util.py", line 507, in to_bytestring
    return value.encode(encoding)
UnicodeEncodeError: 'ascii' codec can't encode character '\xdf' in position 248: ordinal not in range(128)

@cclauss I got an internal server error on staging.openlibrary.org adding that:
2B12909F-BB37-459E-80DE-1FAF654BF44E

Changing to grosse instead worked:
76F67567-E688-4F32-AAD8-D2B1972DA390

Correct... This is about Unicode characters vs. Ascii characters. Any Unicode character (even an emoji) will cause a problem on either Py2 or Py3.

Hi, I'm getting a similar error when I try to edit this book:

https://openlibrary.org/books/OL9174734M/Biblioteca_Vasconcelos_Library

The error:

/opt/openlibrary/deploys/openlibrary/2b017b5/openlibrary/templates/books/edit.html: error in processing template: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 11: ordinal not in range(128) (falling back to default template)

class OLIndexer() has a method normalize_edition_title() that seems to be converting Unicode titles into ASCII titles. ;-(

The Problem also occours whent trying to edit an existing book where the authors name contains certain characters:

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cdrini picture cdrini  Â·  5Comments

jdlrobson picture jdlrobson  Â·  5Comments

Yashs911 picture Yashs911  Â·  5Comments

LeadSongDog picture LeadSongDog  Â·  5Comments

BrittanyBunk picture BrittanyBunk  Â·  5Comments