Spacy: ValueError: 2539520 exceeds max_bin_len(1048576) when uses spacy.load()

Created on 30 Nov 2018  路  18Comments  路  Source: explosion/spaCy

Hi, I'm new with spaCy.

So, I've tried a little script to understand how it works:

import spacy

nlp = spacy.load('pt')

I've already have installed spacy (with pip and conda), using python3.6, and already downloaded the portuguese model, but I'm getting this following error:

Traceback (most recent call last):
File "C:/Users/rocha/PycharmProjects/projeto/entidade.py", line 4, in
nlp = spacy.load('pt')
File "C:\Users\rochaAnaconda3lib\site-packages\spacy__init__.py", line 18, in load
return util.load_model(name, overrides)
File "C:\Users\rochaAnaconda3lib\site-packages\spacy\util.py", line 112, in load_model
return load_model_from_link(name, *overrides)
File "C:\Users\rochaAnaconda3lib\site-packages\spacy\util.py", line 129, in load_model_from_link
return cls.load(
overrides)
File "C:\Users\rochaAnaconda3lib\site-packages\spacy\data\pt__init__.py", line 12, in load
return load_model_from_init_py(__file__, overrides)
File "C:\Users\rochaAnaconda3lib\site-packages\spacy\util.py", line 173, in load_model_from_init_py
return load_model_from_path(data_path, meta, *
overrides)
File "C:\Users\rochaAnaconda3lib\site-packages\spacy\util.py", line 156, in load_model_from_path
return nlp.from_disk(model_path)
File "C:\Users\rochaAnaconda3lib\site-packages\spacy\language.py", line 647, in from_disk
util.from_disk(path, deserializers, exclude)
File "C:\Users\rochaAnaconda3lib\site-packages\spacy\util.py", line 511, in from_disk
reader(path / key)
File "C:\Users\rochaAnaconda3lib\site-packages\spacy\language.py", line 643, in
deserializers[name] = lambda p, proc=proc: proc.from_disk(p, vocab=False)
File "pipeline.pyx", line 643, in spacy.pipeline.Tagger.from_disk
File "C:\Users\rochaAnaconda3lib\site-packages\spacy\util.py", line 511, in from_disk
reader(path / key)
File "pipeline.pyx", line 626, in spacy.pipeline.Tagger.from_disk.load_model
File "pipeline.pyx", line 627, in spacy.pipeline.Tagger.from_disk.load_model
File "C:\Users\rochaAnaconda3lib\site-packagesthinc\neural_classes\model.py", line 335, in from_bytes
data = msgpack.loads(bytes_data, encoding='utf8')
File "C:\Users\rochaAnaconda3lib\site-packages\msgpack_numpy.py", line 214, in unpackb
return _unpackb(packed, *kwargs)
File "msgpack_unpacker.pyx", line 187, in msgpack._cmsgpack.unpackb
*
ValueError: 2539520 exceeds max_bin_len(1048576)

Anyone can help?

feat / serialize third-party 馃敭 thinc

Most helpful comment

Thanks for the report and sorry you've hit this problem. It also just came up in our tests today and it was pretty confusing.

Looks like it might be related to an update of the msgpack library that was released today and is used in our library thinc, which spaCy depends on. So when you installed spaCy, that new version was pulled in and apparently it includes a change to the limit?

We'll investigate this and hopefully push an update to thinc soon. In the meantime, try downgrading msgpack:

pip install msgpack==0.5.6

All 18 comments

Thanks for the report and sorry you've hit this problem. It also just came up in our tests today and it was pretty confusing.

Looks like it might be related to an update of the msgpack library that was released today and is used in our library thinc, which spaCy depends on. So when you installed spaCy, that new version was pulled in and apparently it includes a change to the limit?

We'll investigate this and hopefully push an update to thinc soon. In the meantime, try downgrading msgpack:

pip install msgpack==0.5.6

Just hit this as well, your fix works (we did msgpack>=0.3.0,<0.6).

Glad it worked!

We also released Thinc v6.12.1 earlier, which pins to the exact msgpack version. It should now be installed automatically when you install/update spaCy.
https://github.com/explosion/thinc/releases/tag/v6.12.1

Great, thanks so much!

Probably best to keep this open for now, as people with cached packages might still run into this.

tl;dr: Thinc 6.12.1 is up now, so fresh installs should work. If your installation doesn't work, do:

python -m pip install "msgpack<0.6.0"

Note: I just fired up a fresh Ubuntu 18.04 VM in Azure and sudo pip install -U spacy reveals this message in the console:

thinc 6.12.1 has requirement msgpack<0.6.0,>=0.5.6, but you'll have msgpack 0.6.0 which is incompatible.

... So it looks like you do need to manually install the older version of msgpack along with a fresh install of spaCy (i.e., python -m pip install "msgpack<0.6.0" is required).

Hmm! What else requires msgpack in your environment though? spaCy shouldn't be depending on it directly.

The version of msgpack is 0.5.6, but the problem still exists.

disfluency_detection/crf.py:36: in __init__
self.nlp = spacy.load('en')
.venv/lib/python3.6/site-packages/spacy/__init__.py:15: in load
return util.load_model(name, overrides)
.venv/lib/python3.6/site-packages/spacy/util.py:112: in load_model
return load_model_from_link(name, *overrides)
.venv/lib/python3.6/site-packages/spacy/util.py:129: in load_model_from_link
return cls.load(
overrides)
.venv/lib/python3.6/site-packages/spacy/data/en/__init__.py:12: in load
return load_model_from_init_py(__file__, *
overrides)
.venv/lib/python3.6/site-packages/spacy/util.py:173: in load_model_from_init_py
return load_model_from_path(data_path, meta, *overrides)
.venv/lib/python3.6/site-packages/spacy/util.py:156: in load_model_from_path
return nlp.from_disk(model_path)
.venv/lib/python3.6/site-packages/spacy/language.py:653: in from_disk
util.from_disk(path, deserializers, exclude)
.venv/lib/python3.6/site-packages/spacy/util.py:511: in from_disk
reader(path / key)
.venv/lib/python3.6/site-packages/spacy/language.py:649: in
deserializers[name] = lambda p, proc=proc: proc.from_disk(p, vocab=False)
pipeline.pyx:643: in spacy.pipeline.Tagger.from_disk
???
.venv/lib/python3.6/site-packages/spacy/util.py:511: in from_disk
reader(path / key)
pipeline.pyx:626: in spacy.pipeline.Tagger.from_disk.load_model
???
pipeline.pyx:627: in spacy.pipeline.Tagger.from_disk.load_model
???
.venv/lib/python3.6/site-packages/thinc/neural/_classes/model.py:335: in from_bytes
data = msgpack.loads(bytes_data, encoding='utf8')
.venv/lib/python3.6/site-packages/msgpack_numpy.py:214: in unpackb
return _unpackb(packed, *
kwargs)
msgpack/_unpacker.pyx:187: in msgpack._cmsgpack.unpackb
???
E ValueError: 1792000 exceeds max_bin_len(1048576)

Anyone could help?

I tested on a raw (virgin) VM instance of Ubuntu 18.04 on Azure just to rule everything out.

My only commands:

sudo apt-get update
sudo apt-get upgrade -y
sudo apt-get dist-upgrade
sudo apt-get install -y python-pip
sudo python -m pip install --upgrade pip
pip install -U spacy

...It looks like msgpack is included in the default Ubuntu 18.04 system (upgraded to the latest).

Try pip install spacy==2.0.18

Just got started with spaCy, had the same error.
Downgrading msgpack to 0.5.6 solved it. Thanks!

Thank you for the very quick fix!

Travis build of spacyr experienced the same issue yesterday:
https://travis-ci.org/quanteda/spacyr/builds/461753381
(see the line 4336 for the failure. The installed spacy was 2.0.17 (line 4088)).

I was to file the issue today, but the problem is already gone in a build this morning. The spacy version is 2.0.18 in this build.
https://travis-ci.org/quanteda/spacyr/builds/462103684

Just for a reference, our package installs spaCy with the following:

conda create -n spacy_condaenv python=3.6 -y
source activate spacy_condaenv
pip install spacy
python -m spacy download en

@amatsuo Glad it was fixed!

You should probably update your installation to include a version range. It should now be safe to install spaCy via conda as well. conda install -c conda-forge "spacy>=2.0.0,<2.1.0" is recommended. Minor versions such as v2.1 aren't necessarily model-compatible: v2.1 will require new models to be downloaded and trained.

Version pinning is especially useful for helping people producing reproducible experiments. If versions aren't pinned, when someone tries your code in a few years time, new versions of the software will be installed and everything will break.

@honnibal Thanks for the suggestion! Sounds a good idea.

We'd include it in the next update.

I've got the same problem of @RochaOwng and downgrading the msgpack solved the problem. Thanks @ines , you just saved my day :))

For those who installed spacy with conda, I found the following command to work the best:

conda install -c conda-forge msgpack-python==0.5.6

conda install -c conda-forge msgpack-python==0.5.6

Worked for conda installation. Thanks @cameronrhamilton !

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings