Spacy: from_bytes, cant find model en_model.vectors

Created on 6 Sep 2019 · 12Comments · Source: explosion/spaCy

I have been using the nlp.from_bytes method with the large english model version 1. We upgraded to the 2.1.0 model, and now im getting an error stating that it cant find the en_model.vectors. See below how I load the model. When I do the to_bytes, am I supposed to also save the en_model.vectors separately?

with open(os.path.join(filepath, "large_2.1.0.txt"), 'rb') as file:
modelContent = file.read()

nlp = spacy.blank(meta["lang"])

for pipe_name in meta["pipeline"]:
pipe = nlp.create_pipe(pipe_name)
nlp.add_pipe(pipe)

nlp.from_bytes(modelContent)

Your Environment

Operating System:
Python Version Used:
spaCy Version Used:
Environment Information:

bug feat / vectors

Source

smiles3983

All 12 comments

looks like the thinc.load_nlp assumes its on disk. https://github.com/explosion/thinc/blob/master/thinc/extra/load_nlp.py

I am loading the whole model from memory.

smiles3983 on 6 Sep 2019

here is my stack trace:

Traceback (most recent call last): File "c:\ws\repos\quality\src\main\python\ins\quality\nltk_helper.py", line 89, in get_spacy_model nlp.from_bytes(modelContent) File "C:\Programs\Python36\lib\site-packages\spacy\language.py", line 893, in from_bytes util.from_bytes(bytes_data, deserializers, exclude) File "C:\Programs\Python36\lib\site-packages\spacy\util.py", line 616, in from_bytes setter(msg[key]) File "C:\Programs\Python36\lib\site-packages\spacy\language.py", line 890, in <lambda> b, exclude=["vocab"] File "pipes.pyx", line 600, in spacy.pipeline.pipes.Tagger.from_bytes File "C:\Programs\Python36\lib\site-packages\spacy\util.py", line 616, in from_bytes setter(msg[key]) File "pipes.pyx", line 597, in spacy.pipeline.pipes.Tagger.from_bytes.lambda16 File "pipes.pyx", line 580, in spacy.pipeline.pipes.Tagger.from_bytes.load_model File "pipes.pyx", line 530, in spacy.pipeline.pipes.Tagger.Model File "C:\Programs\Python36\lib\site-packages\spacy\_ml.py", line 523, in build_tagger_model pretrained_vectors=pretrained_vectors, File "C:\Programs\Python36\lib\site-packages\spacy\_ml.py", line 346, in Tok2Vec glove = StaticVectors(pretrained_vectors, width, column=cols.index(ID)) File "C:\Programs\Python36\lib\site-packages\thinc\neural\_classes\static_vectors.py", line 43, in __init__ vectors = self.get_vectors() File "C:\Programs\Python36\lib\site-packages\thinc\neural\_classes\static_vectors.py", line 55, in get_vectors return get_vectors(self.ops, self.lang) File "C:\Programs\Python36\lib\site-packages\thinc\extra\load_nlp.py", line 26, in get_vectors nlp = get_spacy(lang) File "C:\Programs\Python36\lib\site-packages\thinc\extra\load_nlp.py", line 14, in get_spacy SPACY_MODELS[lang] = spacy.load(lang, **kwargs) File "C:\Programs\Python36\lib\site-packages\spacy\__init__.py", line 27, in load return util.load_model(name, **overrides) File "C:\Programs\Python36\lib\site-packages\spacy\util.py", line 139, in load_model raise IOError(Errors.E050.format(name=name)) OSError: [E050] Can't find model 'en_model.vectors'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

smiles3983 on 6 Sep 2019

@smiles3983 The nlp.from_bytes() method should definitely work without touching disk. The code in Thinc that you've pointed to is a bit misleading in that respect, as there's some backwards compatibility fixes there.

I think the real problem is actually pretty simple: the model format changed between v2.0 and v2.1. You need to retrain the model between the versions. The problem runs deeper than mere formatting changes: the models are different and so the same weights won't work in both versions.

honnibal on 9 Sep 2019

Im not using the 2.0 model at all anymore. Im importing the 2.1 model using the disk installed model using pip and then spacy.load(), no error, and then im doing a to_bytes, saving the binary as well as the meta file extract. I then start over and take the binary, and the meta and create a blank meta, load the model using from_bytes. Thats when I get the error that it cant find the en_model_vectors. I didnt have to train the model with version 2.0 unless Spacy was doing this for me. All I did was change the model to 2.1 to create a new binary and a new meta file, and then im loading it in the same way as before. How do I retrain it?

smiles3983 on 9 Sep 2019

Hmm. Okay, thanks I'll look into this.

honnibal on 9 Sep 2019

👍1

Can I just remove the en_model_vectors section from the meta? think that would still work? or will I be missing the point of having the large model?

smiles3983 on 11 Sep 2019

That would cause a different error, as the models will be expecting the vectors. I've verified the error, we should be able to get a solution for the next version.

honnibal on 12 Sep 2019

🎉1

The issue is that the model has a previous name listed for its vectors, and we don't have the fixup logic in the from_bytes code. You should be able to work around the error with this prior to serialization:


for name, pipe in nlp.pipeline:
    if hasattr(pipe, "cfg") and pipe.cfg.get("pretrained_vectors"):
        pipe.cfg["pretrained_vectors"] = nlp.vocab.vectors.name

honnibal on 12 Sep 2019

ok, ill test it Monday. this should work below?

```nlp = spacy.blank(meta["lang"])
# 2. Initialize the pipeline
for pipe_name in meta["pipeline"]:
pipe = nlp.create_pipe(pipe_name)
nlp.add_pipe(pipe)

        for name, pipe in nlp.pipeline:
            if hasattr(pipe, "cfg") and pipe.cfg.get("pretrained_vectors"):
                pipe.cfg["pretrained_vectors"] = nlp.vocab.vectors.name

        nlp.from_bytes(modelContent)

smiles3983 on 13 Sep 2019

Yes, I think that should work. Regardless, it's definitely fixed in v2.2, which we expect to be out Monday (just having some trouble with the wheel building system).

honnibal on 29 Sep 2019

🎉1

Awesome. Yeah we dropped back to 2.0 for now. So we’ll have to test the new version next week. If we weren’t limited to 3GB disk usage in our enterprise container we wouldn’t have to use the from_bytes function. We have our own ML models as well which take up a lot of space so we have to stay in memory.

smiles3983 on 29 Sep 2019

👍1

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.