Spacy: 💫 Improve model saving and loading

Created on 8 May 2017  ·  4Comments  ·  Source: explosion/spaCy

The APIs for saving and loading model files in spaCy 1.0 will be consolidated and made more consistent in spaCy 2.0. These changes will affect the following model classes:

  • Language (and its subclasses)
  • Vocab
  • StringStore
  • Morphology
  • Lemmatizer
  • Tokenizer
  • Tagger
  • Parser
  • Matcher

The following methods will be supported:

Pickle – __reduce__

All model classes will support the pickle protocol.

to_binary() / from_binary(bytes)

These methods will de/serialize the model from/to a byte stream. These methods will not do IO, so that they can be used to send the model over the wire instead of writing it to disk.

to_disk(path, format=None) / from_disk(path, format=None)

These methods will save or load the models from the file system. An optional format arg will be supported, with interpretation varying by class. These methods will prioritise convenience and simplicity.

Deprecated methods

The new methods will replace the existing saving and loading methods:

  • StringStore.save(), StringStore.load()
  • Tokenizer.load()
  • Tagger.model.dump(), Tagger.model.load(), Tagger.load()
  • Parser.model.dump(), Parser.model.load(), Parser.load()
  • Vocab.load_lexemes(), Vocab.load(), Vocab.load_vectors(), Vocab.load_vectors_from_bin_loc(), Vocab.dump(), Vocab.dump_vectors()
  • Language.load(), Language.save_to_directory()

Related issues

  • Word vector loading: #671, #809, #856, #1012
  • Training workflow: #999, #1026
enhancement ⚠️ wip 🌙 nightly

Most helpful comment

May I make a small documentation suggestion that might save others the 1/2 I spent looking through old issues to find a solution. Document the loading the pipeline keywords arguments that allow one to set parser, ner, etc. to false to speed up loading, e.g., just need tokenization.

All 4 comments

May I make a small documentation suggestion that might save others the 1/2 I spent looking through old issues to find a solution. Document the loading the pipeline keywords arguments that allow one to set parser, ner, etc. to false to speed up loading, e.g., just need tokenization.

Thanks, will definitely have that documented. Actually there should probably also be another load function, for the "slimmer" version.

See the v2.0.0 alpha release notes and #1105 🎉

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings