Spacy: Is Spacy thread safe?

Created on 6 May 2019 · 6Comments · Source: explosion/spaCy

Is Spacy thread safe?

If I run it in a few threads, I get the following exceptions. Creating a minimal code to reconstruct would take time, so I just wanted to know beforehand if it should be threadsafe, or what are the guidelines to use it properly?

 [1;38;5;1mUndefined operator: >> [0m
  Called by (<thinc.neural._classes.function_layer.FunctionLayer object at 0x7fed95b12350>, <thinc.neural._classes.feed_forward.FeedForward object at 0x7fed95bbb8d0>)
  Available: 

   [1;38;5;4mTraceback: [0m
  |__  [1m<lambda> [0m [782] in /usr/local/lib/python2.7/dist-packages/spacy/language.py
  |____  [1mfrom_disk [0m [611] in /usr/local/lib/python2.7/dist-packages/spacy/util.py
  |_____  [1mbuild_tagger_model [0m [511] in /usr/local/lib/python2.7/dist-packages/spacy/_ml.py
     [38;5;1m     >>> [0m pretrained_vectors=pretrained_vectors,
]
_
    self.nlp = spacy.load("en_core_web_sm")  
  File "/usr/local/lib/python2.7/dist-packages/spacy/__init__.py", line 27, in load
    return util.load_model(name, **overrides)
  File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 131, in load_model
    return load_model_from_package(name, **overrides)
  File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 152, in load_model_from_package
    return cls.load(**overrides)
  File "/usr/local/lib/python2.7/dist-packages/en_core_web_sm/__init__.py", line 12, in load
    return load_model_from_init_py(__file__, **overrides)
  File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 190, in load_model_from_init_py
    return load_model_from_path(data_path, meta, **overrides)
  File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 173, in load_model_from_path
    return nlp.from_disk(model_path)
  File "/usr/local/lib/python2.7/dist-packages/spacy/language.py", line 786, in from_disk
    util.from_disk(path, deserializers, exclude)
  File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 611, in from_disk
    reader(path / key)
  File "/usr/local/lib/python2.7/dist-packages/spacy/language.py", line 782, in <lambda>
    deserializers[name] = lambda p, proc=proc: proc.from_disk(p, exclude=["vocab"])
  File "pipes.pyx", line 617, in spacy.pipeline.pipes.Tagger.from_disk
  File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 611, in from_disk
    reader(path / key)
  File "pipes.pyx", line 599, in spacy.pipeline.pipes.Tagger.from_disk.load_model
  File "pipes.pyx", line 512, in spacy.pipeline.pipes.Tagger.Model
  File "/usr/local/lib/python2.7/dist-packages/spacy/_ml.py", line 511, in build_tagger_model
    pretrained_vectors=pretrained_vectors,
  File "/usr/local/lib/python2.7/dist-packages/spacy/_ml.py", line 361, in Tok2Vec
    embed >> convolution ** conv_depth, pad=conv_depth
  File "/usr/local/lib/python2.7/dist-packages/thinc/check.py", line 129, in checker
    raise UndefinedOperatorError(op, instance, args[0], instance._operators)
UndefinedOperatorError: 

   [1;38;5;1mUndefined operator: >> [0m
  Called by (<thinc.neural._classes.function_layer.FunctionLayer object at 0x7fed95b12350>, <thinc.neural._classes.feed_forward.FeedForward object at 0x7fed95bbb8d0>)
  Available: 

   [1;38;5;4mTraceback: [0m
  |__  [1m<lambda> [0m [782] in /usr/local/lib/python2.7/dist-packages/spacy/language.py
  |____  [1mfrom_disk [0m [611] in /usr/local/lib/python2.7/dist-packages/spacy/util.py
  |_____  [1mbuild_tagger_model [0m [511] in /usr/local/lib/python2.7/dist-packages/spacy/_ml.py
     [38;5;1m     >>> [0m pretrained_vectors=pretrained_vectors,

bug 🔮 thinc

Source

ndvbd

👍1

All 6 comments

It's supposed to be thread-safe. What version are you running?

I think I see what the problem could be: creating the model touches some global state, so this might not be thread-safe? If so this should be fixed. Also, have you tried the same code in Python 3?

honnibal on 6 May 2019

Spacy 2.1.3, model version: 2.1.0, python 2.7. Haven't tried on python 3.
I managed to workaround the problem by loading the model first on 1 thread:
spacy.load("en_core_web_sm")
And only after the model was loaded in 1 thread, access it on many threads.
If you can make it safer that also concurrent .load functions will work that would be great and safe.

ndvbd on 9 May 2019

👍1

@ndvbd Thanks for the report, I'll look into that.

honnibal on 11 May 2019

This was indeed an annoying problem, which should now be fixed in Thinc v7.0.5. Inside the thinc/neural/_classes/model.py module, I've changed the operators dict to be in thread-local storage. I also added a test which failed previously, and which passed after my patch.

Still, there could be other problems I'm missing. Please do report if there are further problems. I already suspect we'll see a problem with models that use word vectors, as I think those touch some global state as well.

honnibal on 10 Jul 2019

👍1

Great stuff.

ndvbd on 11 Jul 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.