Spacy: Cannot save a model to disk if I add a tagger component

Created on 12 Jun 2020 · 3Comments · Source: explosion/spaCy

How to reproduce the behaviour

Hello!
I tried to run the following code:

import spacy

nlp = spacy.blank("fr")
nlp.add_pipe(nlp.create_pipe("sentencizer"))
nlp.add_pipe(nlp.create_pipe("tagger"))
nlp.add_pipe(nlp.create_pipe("parser"))
nlp.to_disk('my\folder\')

and I got the following error message:

  File "my\folder/create_blank_model.py", line 18, in <module>
    nlp.to_disk(PATH + '/../models/test_model')
  File "my\folder\v_env\lib\site-packages\spacy\language.py", line 911, in to_disk
    util.to_disk(path, serializers, exclude)
  File "my\folder\v_env\lib\site-packages\spacy\util.py", line 645, in to_disk
    writer(path / key)
  File "my\folder\v_env\lib\site-packages\spacy\language.py", line 909, in <lambda>
    serializers[name] = lambda p, proc=proc: proc.to_disk(p, exclude=["vocab"])
  File "pipes.pyx", line 632, in spacy.pipeline.pipes.Tagger.to_disk
  File "my\folder\v_env\lib\site-packages\spacy\util.py", line 645, in to_disk
    writer(path / key)
  File "pipes.pyx", line 628, in spacy.pipeline.pipes.Tagger.to_disk.lambda22
TypeError: Required argument 'length' (pos 1) not found

Process finished with exit code 1

I must admit that I do not understand what is this 'length' argument that I should add.
Moreover, if I run the code without the line adding the tagger, as in:

import spacy

nlp = spacy.blank("fr")
nlp.add_pipe(nlp.create_pipe("sentencizer"))
nlp.add_pipe(nlp.create_pipe("parser"))
nlp.to_disk('my\folder\')

then all works smoothly.
I came across a workaround, that is to load a model (the fr_core_news_md), but for my current project I need to train the NER and Text Classifier of a blank model, and then I must also use the tagger.

Your Environment

spaCy version: 2.2.4
Platform: Windows-10-10.0.17134-SP0
Python version: 3.6.5

bug feat / serialize feat / tagger

Source

mohateri

All 3 comments

Thanks for the report, that does look like a bug!

I'm guessing this only happens when the model in your tagger is not actually instantiated yet. The model would be instantiated when you call begin_training. You can experiment whether that fixes the issue for now. If that's the case, you wouldn't run into any issues when you actually do create a model, train it, and then store it.

But I think we should be able to make this code more robust, as well.

svlandeg on 12 Jun 2020

👍1

Thank you for the answer, the solution you provided works perfectly!

mohateri on 15 Jun 2020

👍1

From v.3 onwards, the pipeline components will always need to have a model internally, so you won't be able to get into this weird state where you have a pipeline but it's not really initialized yet. In almost all use-cases, you'll store the component only after training a model for it, so you wouldn't run into the original error you described. So I think it's not too big a deal for the current v.2.x versions, and it looks like you can continue working too :-) If OK, I'll go ahead and close this.

svlandeg on 21 Jun 2020

Was this page helpful?

0 / 5 - 0 ratings