I was following the example here to train my own NER model https://github.com/explosion/spaCy/blob/master/examples/training/train_ner.py and I got the following error with the stack trace:
Traceback (most recent call last):
File "app.py", line 121, in <module>
nlp.to_disk(output_dir)
File "/home/ec2-user/ner_model/venv/lib64/python3.6/site-packages/spacy/language.py", line 621, in to_disk
util.to_disk(path, serializers, {p: False for p in disable})
File "/home/ec2-user/ner_model/venv/lib64/python3.6/site-packages/spacy/util.py", line 503, in to_disk
writer(path / key)
File "/home/ec2-user/ner_model/venv/lib64/python3.6/site-packages/spacy/language.py", line 609, in <lambda>
('tokenizer', lambda p: self.tokenizer.to_disk(p, vocab=False)),
File "tokenizer.pyx", line 354, in spacy.tokenizer.Tokenizer.to_disk
File "tokenizer.pyx", line 355, in spacy.tokenizer.Tokenizer.to_disk
File "tokenizer.pyx", line 384, in spacy.tokenizer.Tokenizer.to_bytes
File "/home/ec2-user/ner_model/venv/lib64/python3.6/site-packages/spacy/util.py", line 486, in to_bytes
return msgpack.dumps(serialized, use_bin_type=True, encoding='utf8')
File "/home/ec2-user/ner_model/venv/lib64/python3.6/site-packages/msgpack_numpy.py", line 196, in packb
return Packer(**kwargs).pack(o)
TypeError: __init__() got an unexpected keyword argument 'encoding'
pip list returnscertifi (2018.8.24)
chardet (3.0.4)
cymem (1.31.2)
cytoolz (0.9.0.1)
dill (0.2.8.2)
idna (2.7)
msgpack (0.5.6)
msgpack-numpy (0.4.4.1)
murmurhash (0.28.0)
numpy (1.15.2)
pip (9.0.3)
plac (0.9.6)
preshed (1.0.1)
regex (2017.4.5)
requests (2.19.1)
setuptools (39.0.1)
six (1.11.0)
spacy (2.0.12)
thinc (6.10.3)
toolz (0.9.0)
tqdm (4.26.0)
ujson (1.35)
urllib3 (1.23)
wrapt (1.10.11)
Any ideas why my to_disk() throwing this error?
I get the same error but in my case it is the spacy.displacy.render() function:
Traceback (most recent call last):
File "serve_trees.py", line 27, in
spacy.displacy.render(doc, style='dep', jupyter=False)
File "/home/bachstelze/workspaces/spacy_test/lib/python3.6/site-packages/spacy/displacy/__init__.py", line 39, in render
parsed = [converter(doc, options) for doc in docs] if not manual else docs
File "/home/bachstelze/workspaces/spacy_test/lib/python3.6/site-packages/spacy/displacy/__init__.py", line 39, in
parsed = [converter(doc, options) for doc in docs] if not manual else docs
File "/home/bachstelze/workspaces/spacy_test/lib/python3.6/site-packages/spacy/displacy/__init__.py", line 89, in parse_deps
doc = Doc(orig_doc.vocab).from_bytes(orig_doc.to_bytes())
File "doc.pyx", line 804, in spacy.tokens.doc.Doc.to_bytes
File "/home/bachstelze/workspaces/spacy_test/lib/python3.6/site-packages/spacy/util.py", line 486, in to_bytes
return msgpack.dumps(serialized, use_bin_type=True, encoding='utf8')
File "/home/bachstelze/workspaces/spacy_test/lib/python3.6/site-packages/msgpack_numpy.py", line 196, in packb
return Packer(**kwargs).pack(o)
TypeError: __init__() got an unexpected keyword argument 'encoding'
It seems that the error comes from util.py with the last three commits about this encoding: https://github.com/explosion/spaCy/commits/master/spacy/util.py
The last commit adds again the problematic encoding: https://github.com/explosion/spaCy/commit/6430b1fe64c4e29f35c701ceb0ce0bba8be5fda4
But there is no explanation for the deletion and restoring?!
Current workaround: pip install "msgpack-numpy<0.4.4.0"
The issue is that msgpack-numpy 0.4.4.1 has been released with a backwards-incompatible change: that argument was deprecated, and now throws an error.
The best solution until Thinc updates with a new version is to pin to a previous version of msgpack-numpy, which I think needs this argument for Python 2.7
This should be fixed in the latest release of spaCy / Thinc!
For the upcoming version v2.1.0 (currently on develop and available as spacy-nightly), we've packaged our own library of serialization utilities called srlsy, which bundles forks of msgpack and ujson, lets us implement fixes and improvements, ensures spaCy won't break due to third-party updates and lets us ship wheels for the entire thing 馃帀 See here for details: https://github.com/explosion/srsly
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
Current workaround:
pip install "msgpack-numpy<0.4.4.0"The issue is that
msgpack-numpy0.4.4.1 has been released with a backwards-incompatible change: that argument was deprecated, and now throws an error.The best solution until Thinc updates with a new version is to pin to a previous version of
msgpack-numpy, which I think needs this argument for Python 2.7