Spacy: Is Spacy thread safe?

Created on 6 May 2019  路  6Comments  路  Source: explosion/spaCy

Is Spacy thread safe?

If I run it in a few threads, I get the following exceptions. Creating a minimal code to reconstruct would take time, so I just wanted to know beforehand if it should be threadsafe, or what are the guidelines to use it properly?

 [1;38;5;1mUndefined operator: >> [0m
  Called by (<thinc.neural._classes.function_layer.FunctionLayer object at 0x7fed95b12350>, <thinc.neural._classes.feed_forward.FeedForward object at 0x7fed95bbb8d0>)
  Available: 

   [1;38;5;4mTraceback: [0m
  |__  [1m<lambda> [0m [782] in /usr/local/lib/python2.7/dist-packages/spacy/language.py
  |____  [1mfrom_disk [0m [611] in /usr/local/lib/python2.7/dist-packages/spacy/util.py
  |_____  [1mbuild_tagger_model [0m [511] in /usr/local/lib/python2.7/dist-packages/spacy/_ml.py
     [38;5;1m     >>> [0m pretrained_vectors=pretrained_vectors,
]
_
    self.nlp = spacy.load("en_core_web_sm")  
  File "/usr/local/lib/python2.7/dist-packages/spacy/__init__.py", line 27, in load
    return util.load_model(name, **overrides)
  File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 131, in load_model
    return load_model_from_package(name, **overrides)
  File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 152, in load_model_from_package
    return cls.load(**overrides)
  File "/usr/local/lib/python2.7/dist-packages/en_core_web_sm/__init__.py", line 12, in load
    return load_model_from_init_py(__file__, **overrides)
  File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 190, in load_model_from_init_py
    return load_model_from_path(data_path, meta, **overrides)
  File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 173, in load_model_from_path
    return nlp.from_disk(model_path)
  File "/usr/local/lib/python2.7/dist-packages/spacy/language.py", line 786, in from_disk
    util.from_disk(path, deserializers, exclude)
  File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 611, in from_disk
    reader(path / key)
  File "/usr/local/lib/python2.7/dist-packages/spacy/language.py", line 782, in <lambda>
    deserializers[name] = lambda p, proc=proc: proc.from_disk(p, exclude=["vocab"])
  File "pipes.pyx", line 617, in spacy.pipeline.pipes.Tagger.from_disk
  File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 611, in from_disk
    reader(path / key)
  File "pipes.pyx", line 599, in spacy.pipeline.pipes.Tagger.from_disk.load_model
  File "pipes.pyx", line 512, in spacy.pipeline.pipes.Tagger.Model
  File "/usr/local/lib/python2.7/dist-packages/spacy/_ml.py", line 511, in build_tagger_model
    pretrained_vectors=pretrained_vectors,
  File "/usr/local/lib/python2.7/dist-packages/spacy/_ml.py", line 361, in Tok2Vec
    embed >> convolution ** conv_depth, pad=conv_depth
  File "/usr/local/lib/python2.7/dist-packages/thinc/check.py", line 129, in checker
    raise UndefinedOperatorError(op, instance, args[0], instance._operators)
UndefinedOperatorError: 

   [1;38;5;1mUndefined operator: >> [0m
  Called by (<thinc.neural._classes.function_layer.FunctionLayer object at 0x7fed95b12350>, <thinc.neural._classes.feed_forward.FeedForward object at 0x7fed95bbb8d0>)
  Available: 

   [1;38;5;4mTraceback: [0m
  |__  [1m<lambda> [0m [782] in /usr/local/lib/python2.7/dist-packages/spacy/language.py
  |____  [1mfrom_disk [0m [611] in /usr/local/lib/python2.7/dist-packages/spacy/util.py
  |_____  [1mbuild_tagger_model [0m [511] in /usr/local/lib/python2.7/dist-packages/spacy/_ml.py
     [38;5;1m     >>> [0m pretrained_vectors=pretrained_vectors,
bug 馃敭 thinc

All 6 comments

It's supposed to be thread-safe. What version are you running?

I think I see what the problem could be: creating the model touches some global state, so this might not be thread-safe? If so this should be fixed. Also, have you tried the same code in Python 3?

Spacy 2.1.3, model version: 2.1.0, python 2.7. Haven't tried on python 3.
I managed to workaround the problem by loading the model first on 1 thread:
spacy.load("en_core_web_sm")
And only after the model was loaded in 1 thread, access it on many threads.
If you can make it safer that also concurrent .load functions will work that would be great and safe.

@ndvbd Thanks for the report, I'll look into that.

This was indeed an annoying problem, which should now be fixed in Thinc v7.0.5. Inside the thinc/neural/_classes/model.py module, I've changed the operators dict to be in thread-local storage. I also added a test which failed previously, and which passed after my patch.

Still, there could be other problems I'm missing. Please do report if there are further problems. I already suspect we'll see a problem with models that use word vectors, as I think those touch some global state as well.

Great stuff.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings