Bjoern: WSGI strings should be decoded with ISO-8859-1 on Python 3

Created on 3 Aug 2018  路  5Comments  路  Source: jonashaag/bjoern

Problem

When running my Django app with bjoern and navigating to an URL with characters outside of ISO-8859-1 charset it fails with error:

Traceback (most recent call last):
  File "/webapps/django-interfax-app/env/lib/python3.5/site-packages/django/core/handlers/wsgi.py", line 145, in __call__
    request = self.request_class(environ)
  File "/webapps/django-interfax-app/env/lib/python3.5/site-packages/django/core/handlers/wsgi.py", line 69, in __init__
    path_info = get_path_info(environ)
  File "/webapps/django-interfax-app/env/lib/python3.5/site-packages/django/core/handlers/wsgi.py", line 162, in get_path_info
    path_info = get_bytes_from_wsgi(environ, 'PATH_INFO', '/')
  File "/webapps/django-interfax-app/env/lib/python3.5/site-packages/django/core/handlers/wsgi.py", line 210, in get_bytes_from_wsgi
    return value.encode('iso-8859-1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0457' in position 1: ordinal not in range(256)

Probably the root cause

On Python platforms where the str or StringType type is in fact Unicode-based (e.g. Jython, IronPython, Python 3, etc.), all "strings" referred to in this specification must contain only code points representable in ISO-8859-1 encoding (\u0000 through \u00FF, inclusive). [1]

[1] https://www.python.org/dev/peps/pep-3333/#unicode-issues

Expected

As it's implemented in wsgiref.

def application(environ, start_response):
    start_response('200 OK', [])
    print('PATH_INFO:', environ['PATH_INFO'])
    yield b'OK'

from wsgiref.simple_server import make_server
httpd = make_server('0.0.0.0', 8080, application)
httpd.serve_forever()

curl localhost:8080/%C3%A5 prints PATH_INFO: /脙楼.

Actual

def application(environ, start_response):
    start_response('200 OK', [])
    print('PATH_INFO:', environ['PATH_INFO'])
    yield b'OK'

import bjoern
bjoern.run(application, '0.0.0.0', 8080)

curl localhost:8080/%C3%A5 prints PATH_INFO: /氓.

Bug Needs patch

All 5 comments

Thanks for this great bug report!

Do you also happen to know how this should be fixed?

I'm not an C expert but I can have a look. So far I ended up writing a WSGI middleware to temporary address the issue:

from wsgi import application
import bjoern

class FixBjoernEncoding:

    def __init__(self, app):
        self._app = app

    def __call__(self, environ, start_response):
        environ['PATH_INFO'] = environ.get('PATH_INFO', '/')\
            .encode('utf8').decode('latin-1')
        return self._app(environ, start_response)

bjoern.run(FixBjoernEncoding(application), '0.0.0.0', 8080)

Thanks, I'll fix this in the next days.

Please have a look!

Works like a charm. 馃帀馃帀馃帀

Thanks a lot! 馃檶

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alexted picture alexted  路  12Comments

avloss picture avloss  路  3Comments

saley89 picture saley89  路  34Comments

Varbin picture Varbin  路  21Comments

jonashaag picture jonashaag  路  18Comments