Spacy: 💫 Improve model saving and loading

Created on 8 May 2017 · 4Comments · Source: explosion/spaCy

The APIs for saving and loading model files in spaCy 1.0 will be consolidated and made more consistent in spaCy 2.0. These changes will affect the following model classes:

Language (and its subclasses)
Vocab
StringStore
Morphology
Lemmatizer
Tokenizer
Tagger
Parser
Matcher

The following methods will be supported:

Pickle – `reduce`

All model classes will support the pickle protocol.

`to_binary()` / `from_binary(bytes)`

These methods will de/serialize the model from/to a byte stream. These methods will not do IO, so that they can be used to send the model over the wire instead of writing it to disk.

`to_disk(path, format=None)` / `from_disk(path, format=None)`

These methods will save or load the models from the file system. An optional format arg will be supported, with interpretation varying by class. These methods will prioritise convenience and simplicity.

Deprecated methods

The new methods will replace the existing saving and loading methods:

StringStore.save(), StringStore.load()
Tokenizer.load()
Tagger.model.dump(), Tagger.model.load(), Tagger.load()
Parser.model.dump(), Parser.model.load(), Parser.load()
Vocab.load_lexemes(), Vocab.load(), Vocab.load_vectors(), Vocab.load_vectors_from_bin_loc(), Vocab.dump(), Vocab.dump_vectors()
Language.load(), Language.save_to_directory()

Related issues

Word vector loading: #671, #809, #856, #1012
Training workflow: #999, #1026

enhancement ⚠️ wip 🌙 nightly

Source

honnibal

Most helpful comment

May I make a small documentation suggestion that might save others the 1/2 I spent looking through old issues to find a solution. Document the loading the pipeline keywords arguments that allow one to set parser, ner, etc. to false to speed up loading, e.g., just need tokenization.