Spacy: create_pipe from factory initialized with Language, not Vocab

Created on 7 Feb 2020  路  2Comments  路  Source: explosion/spaCy

The spacy.Language.create_pipe method returns factory(self, **config) but spacy.Pipe is initialized as def __init__(self, vocab, model=True, **cfg). So, a Language is being passed in, but Pipes are expecting Vocabs.

This is giving me issues with deserializing components that serialize vocab. For example:

from spacy.lang.en import English
nlp = English()
tagger = nlp.create_pipe("tagger")
tagger.from_disk("path/to/tagger")

Returns ValueError: Can't read file: path/to/tagger/vocab/strings.json

feat / pipeline usage

Most helpful comment

Hi, this is how to load the vocab and tagger separately:

from spacy.lang.en import English
nlp = English()
nlp.vocab.from_disk("path/to/vocab")
tagger = nlp.create_pipe("tagger")
tagger.from_disk("path/to/tagger", exclude=["vocab"])
nlp.add_pipe(tagger)

If the vocab isn't the same one the tagger was trained with (the main concern is the vectors) you won't get sensible results from the tagger.

Look at the code for Language.from_disk() to see how this is set up in general.

All 2 comments

Hi, this is how to load the vocab and tagger separately:

from spacy.lang.en import English
nlp = English()
nlp.vocab.from_disk("path/to/vocab")
tagger = nlp.create_pipe("tagger")
tagger.from_disk("path/to/tagger", exclude=["vocab"])
nlp.add_pipe(tagger)

If the vocab isn't the same one the tagger was trained with (the main concern is the vectors) you won't get sensible results from the tagger.

Look at the code for Language.from_disk() to see how this is set up in general.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings