Spacy: nlp.to_disk() throwing TypeError: __init__() got an unexpected keyword argument 'encoding'

Created on 28 Sep 2018  路  4Comments  路  Source: explosion/spaCy

How to reproduce the behaviour

I was following the example here to train my own NER model https://github.com/explosion/spaCy/blob/master/examples/training/train_ner.py and I got the following error with the stack trace:

Traceback (most recent call last):
  File "app.py", line 121, in <module>
    nlp.to_disk(output_dir)
  File "/home/ec2-user/ner_model/venv/lib64/python3.6/site-packages/spacy/language.py", line 621, in to_disk
    util.to_disk(path, serializers, {p: False for p in disable})
  File "/home/ec2-user/ner_model/venv/lib64/python3.6/site-packages/spacy/util.py", line 503, in to_disk
    writer(path / key)
  File "/home/ec2-user/ner_model/venv/lib64/python3.6/site-packages/spacy/language.py", line 609, in <lambda>
    ('tokenizer', lambda p: self.tokenizer.to_disk(p, vocab=False)),
  File "tokenizer.pyx", line 354, in spacy.tokenizer.Tokenizer.to_disk
  File "tokenizer.pyx", line 355, in spacy.tokenizer.Tokenizer.to_disk
  File "tokenizer.pyx", line 384, in spacy.tokenizer.Tokenizer.to_bytes
  File "/home/ec2-user/ner_model/venv/lib64/python3.6/site-packages/spacy/util.py", line 486, in to_bytes
    return msgpack.dumps(serialized, use_bin_type=True, encoding='utf8')
  File "/home/ec2-user/ner_model/venv/lib64/python3.6/site-packages/msgpack_numpy.py", line 196, in packb
    return Packer(**kwargs).pack(o)
TypeError: __init__() got an unexpected keyword argument 'encoding'

Your Environment

  • Operating System: Amazon Linux 2, Mac OS X 10.13.2
  • Python Version Used: 3.6
  • spaCy Version Used: 2.0.12
  • Environment Information:
    My pip list returns
certifi (2018.8.24)
chardet (3.0.4)
cymem (1.31.2)
cytoolz (0.9.0.1)
dill (0.2.8.2)
idna (2.7)
msgpack (0.5.6)
msgpack-numpy (0.4.4.1)
murmurhash (0.28.0)
numpy (1.15.2)
pip (9.0.3)
plac (0.9.6)
preshed (1.0.1)
regex (2017.4.5)
requests (2.19.1)
setuptools (39.0.1)
six (1.11.0)
spacy (2.0.12)
thinc (6.10.3)
toolz (0.9.0)
tqdm (4.26.0)
ujson (1.35)
urllib3 (1.23)
wrapt (1.10.11)

Any ideas why my to_disk() throwing this error?

feat / serialize third-party

Most helpful comment

Current workaround: pip install "msgpack-numpy<0.4.4.0"

The issue is that msgpack-numpy 0.4.4.1 has been released with a backwards-incompatible change: that argument was deprecated, and now throws an error.

The best solution until Thinc updates with a new version is to pin to a previous version of msgpack-numpy, which I think needs this argument for Python 2.7

All 4 comments

I get the same error but in my case it is the spacy.displacy.render() function:
Traceback (most recent call last):

File "serve_trees.py", line 27, in
spacy.displacy.render(doc, style='dep', jupyter=False)
File "/home/bachstelze/workspaces/spacy_test/lib/python3.6/site-packages/spacy/displacy/__init__.py", line 39, in render
parsed = [converter(doc, options) for doc in docs] if not manual else docs
File "/home/bachstelze/workspaces/spacy_test/lib/python3.6/site-packages/spacy/displacy/__init__.py", line 39, in
parsed = [converter(doc, options) for doc in docs] if not manual else docs
File "/home/bachstelze/workspaces/spacy_test/lib/python3.6/site-packages/spacy/displacy/__init__.py", line 89, in parse_deps
doc = Doc(orig_doc.vocab).from_bytes(orig_doc.to_bytes())
File "doc.pyx", line 804, in spacy.tokens.doc.Doc.to_bytes
File "/home/bachstelze/workspaces/spacy_test/lib/python3.6/site-packages/spacy/util.py", line 486, in to_bytes
return msgpack.dumps(serialized, use_bin_type=True, encoding='utf8')
File "/home/bachstelze/workspaces/spacy_test/lib/python3.6/site-packages/msgpack_numpy.py", line 196, in packb
return Packer(**kwargs).pack(o)
TypeError: __init__() got an unexpected keyword argument 'encoding'

It seems that the error comes from util.py with the last three commits about this encoding: https://github.com/explosion/spaCy/commits/master/spacy/util.py
The last commit adds again the problematic encoding: https://github.com/explosion/spaCy/commit/6430b1fe64c4e29f35c701ceb0ce0bba8be5fda4

But there is no explanation for the deletion and restoring?!

Current workaround: pip install "msgpack-numpy<0.4.4.0"

The issue is that msgpack-numpy 0.4.4.1 has been released with a backwards-incompatible change: that argument was deprecated, and now throws an error.

The best solution until Thinc updates with a new version is to pin to a previous version of msgpack-numpy, which I think needs this argument for Python 2.7

This should be fixed in the latest release of spaCy / Thinc!

For the upcoming version v2.1.0 (currently on develop and available as spacy-nightly), we've packaged our own library of serialization utilities called srlsy, which bundles forks of msgpack and ujson, lets us implement fixes and improvements, ensures spaCy won't break due to third-party updates and lets us ship wheels for the entire thing 馃帀 See here for details: https://github.com/explosion/srsly

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings