The APIs for saving and loading model files in spaCy 1.0 will be consolidated and made more consistent in spaCy 2.0. These changes will affect the following model classes:
Language (and its subclasses)VocabStringStoreMorphologyLemmatizerTokenizerTaggerParserMatcherThe following methods will be supported:
__reduce__All model classes will support the pickle protocol.
to_binary() / from_binary(bytes)These methods will de/serialize the model from/to a byte stream. These methods will not do IO, so that they can be used to send the model over the wire instead of writing it to disk.
to_disk(path, format=None) / from_disk(path, format=None)These methods will save or load the models from the file system. An optional format arg will be supported, with interpretation varying by class. These methods will prioritise convenience and simplicity.
The new methods will replace the existing saving and loading methods:
StringStore.save(), StringStore.load()Tokenizer.load()Tagger.model.dump(), Tagger.model.load(), Tagger.load()Parser.model.dump(), Parser.model.load(), Parser.load()Vocab.load_lexemes(), Vocab.load(), Vocab.load_vectors(), Vocab.load_vectors_from_bin_loc(), Vocab.dump(), Vocab.dump_vectors()Language.load(), Language.save_to_directory()May I make a small documentation suggestion that might save others the 1/2 I spent looking through old issues to find a solution. Document the loading the pipeline keywords arguments that allow one to set parser, ner, etc. to false to speed up loading, e.g., just need tokenization.
Thanks, will definitely have that documented. Actually there should probably also be another load function, for the "slimmer" version.
See the v2.0.0 alpha release notes and #1105 🎉
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
May I make a small documentation suggestion that might save others the 1/2 I spent looking through old issues to find a solution. Document the loading the pipeline keywords arguments that allow one to set parser, ner, etc. to false to speed up loading, e.g., just need tokenization.