Weblate: UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 4: ordinal not in range(128)

Created on 31 May 2017  路  3Comments  路  Source: WeblateOrg/weblate

Steps to reproduce

Migrate from an old weblate (I did weblate-2.3 to weblate-2.14 following instructions) and upgrade from python27 to python34.

(.virtualenv)weblate@ip-172-33-8-0:/srv/weblate$ python manage.py update_index
Traceback (most recent call last):
  File "manage.py", line 31, in <module>
    execute_from_command_line(sys.argv)
  File "/srv/weblate/.virtualenv/lib/python3.4/site-packages/django/core/management/__init__.py", line 363, in execute_from_command_line
    utility.execute()
  File "/srv/weblate/.virtualenv/lib/python3.4/site-packages/django/core/management/__init__.py", line 355, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/srv/weblate/.virtualenv/lib/python3.4/site-packages/django/core/management/base.py", line 283, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/srv/weblate/.virtualenv/lib/python3.4/site-packages/django/core/management/base.py", line 330, in execute
    output = self.handle(*args, **options)
  File "/srv/weblate/weblate/trans/management/commands/update_index.py", line 44, in handle
    self.do_delete(options['limit'])
  File "/srv/weblate/weblate/trans/management/commands/update_index.py", line 62, in do_delete
    langupdates,
  File "/srv/weblate/weblate/trans/search.py", line 315, in delete_search_units
    writer.commit()
  File "/srv/weblate/.virtualenv/lib/python3.4/site-packages/whoosh/writing.py", line 922, in commit
    finalsegments = self._merge_segments(mergetype, optimize, merge)
  File "/srv/weblate/.virtualenv/lib/python3.4/site-packages/whoosh/writing.py", line 827, in _merge_segments
    return mergetype(self, self.segments)
  File "/srv/weblate/.virtualenv/lib/python3.4/site-packages/whoosh/writing.py", line 101, in MERGE_SMALL
    writer.add_reader(reader)
  File "/srv/weblate/.virtualenv/lib/python3.4/site-packages/whoosh/writing.py", line 710, in add_reader
    self.add_postings_to_pool(reader, basedoc, docmap)
  File "/srv/weblate/.virtualenv/lib/python3.4/site-packages/whoosh/writing.py", line 647, in add_postings_to_pool
    for item in items:
  File "/srv/weblate/.virtualenv/lib/python3.4/site-packages/whoosh/writing.py", line 583, in _process_posts
    for fieldname, text, docnum, weight, vbytes in items:
  File "/srv/weblate/.virtualenv/lib/python3.4/site-packages/whoosh/reading.py", line 429, in iter_postings
    yield (fieldname, btext, m.id(), m.weight(), m.value())
  File "/srv/weblate/.virtualenv/lib/python3.4/site-packages/whoosh/codec/whoosh3.py", line 980, in id
    self._read_ids()
  File "/srv/weblate/.virtualenv/lib/python3.4/site-packages/whoosh/codec/whoosh3.py", line 1082, in _read_ids
    self._read_data()
  File "/srv/weblate/.virtualenv/lib/python3.4/site-packages/whoosh/codec/whoosh3.py", line 1077, in _read_data
    self._data = loads(b)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 4: ordinal not in range(128)

Actual behaviour

  • 500 error when updating a translation
  • Fails to run python manage.py update_index
  • Fails to run python manage.py rebuild_index --all

Expected behaviour

Should not fail to update translations
Should succeed to run above commands

Server configuration

./manage.py list_versions
 * Weblate weblate-2.14
 * Python 3.4.2
 * Django 1.11
 * six 1.10.0
 * social-auth-core 1.3.0
 * social-auth-app-django 1.2.0
 * django-appconf 1.0.2
 * Translate Toolkit 2.1.0
 * Whoosh 2.7.4
 * defusedxml 0.5.0
 * Git 2.1.4
 * Pillow (PIL) 1.1.7
 * dateutil 2.6.0
 * lxml 3.7.3
 * django-crispy-forms 1.6.1
 * compressor 2.1.1
 * djangorestframework 3.6.2
 * pytz 2017.2
 * pyuca N/A
 * python-bidi 0.4.0
 * PyYAML 3.12
 * Database backends: django.db.backends.mysql

Database settings uses {'charset': 'utf8mb4'} in OPTIONS.

(.virtualenv)weblate@weblate:/srv/weblate$ python manage.py shell
Python 3.4.2 (default, Oct  8 2014, 10:45:20)
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from django.conf import settings
>>> settings.DATABASES['default']['OPTIONS']
{'charset': 'utf8mb4'}

Similar issues:

1075

1106

1166

documentation

Most helpful comment

I had this problem while trying to upgrade to python3 and it turned out that my data/whoosh data was corrupted.

Removing the folder and running the following commands solved my problem:

python manage.py update_index
python manage.py rebuild_index --all

Took me some time to realize it was that folder as the following documentation only mentioned encoding issues with the database.
https://docs.weblate.org/en/latest/admin/install.html#unicode-issues-in-mysql

馃殌

All 3 comments

I had this problem while trying to upgrade to python3 and it turned out that my data/whoosh data was corrupted.

Removing the folder and running the following commands solved my problem:

python manage.py update_index
python manage.py rebuild_index --all

Took me some time to realize it was that folder as the following documentation only mentioned encoding issues with the database.
https://docs.weblate.org/en/latest/admin/install.html#unicode-issues-in-mysql

馃殌

AFAIK Whoosh doesn't have same index in Python 2 and 3, so that's the reason. Still this should be documented...

Thank you for your report, the issue you have reported has just been fixed.

  • In case you see problem with the fix, please comment on this issue.
  • In case you see similar problem, please open separate issue.
  • If you are happy with the outcome, consider supporting Weblate by donating.
Was this page helpful?
0 / 5 - 0 ratings