Is Spacy thread safe?
If I run it in a few threads, I get the following exceptions. Creating a minimal code to reconstruct would take time, so I just wanted to know beforehand if it should be threadsafe, or what are the guidelines to use it properly?
[1;38;5;1mUndefined operator: >> [0m
Called by (<thinc.neural._classes.function_layer.FunctionLayer object at 0x7fed95b12350>, <thinc.neural._classes.feed_forward.FeedForward object at 0x7fed95bbb8d0>)
Available:
[1;38;5;4mTraceback: [0m
|__ [1m<lambda> [0m [782] in /usr/local/lib/python2.7/dist-packages/spacy/language.py
|____ [1mfrom_disk [0m [611] in /usr/local/lib/python2.7/dist-packages/spacy/util.py
|_____ [1mbuild_tagger_model [0m [511] in /usr/local/lib/python2.7/dist-packages/spacy/_ml.py
[38;5;1m >>> [0m pretrained_vectors=pretrained_vectors,
]
_
self.nlp = spacy.load("en_core_web_sm")
File "/usr/local/lib/python2.7/dist-packages/spacy/__init__.py", line 27, in load
return util.load_model(name, **overrides)
File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 131, in load_model
return load_model_from_package(name, **overrides)
File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 152, in load_model_from_package
return cls.load(**overrides)
File "/usr/local/lib/python2.7/dist-packages/en_core_web_sm/__init__.py", line 12, in load
return load_model_from_init_py(__file__, **overrides)
File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 190, in load_model_from_init_py
return load_model_from_path(data_path, meta, **overrides)
File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 173, in load_model_from_path
return nlp.from_disk(model_path)
File "/usr/local/lib/python2.7/dist-packages/spacy/language.py", line 786, in from_disk
util.from_disk(path, deserializers, exclude)
File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 611, in from_disk
reader(path / key)
File "/usr/local/lib/python2.7/dist-packages/spacy/language.py", line 782, in <lambda>
deserializers[name] = lambda p, proc=proc: proc.from_disk(p, exclude=["vocab"])
File "pipes.pyx", line 617, in spacy.pipeline.pipes.Tagger.from_disk
File "/usr/local/lib/python2.7/dist-packages/spacy/util.py", line 611, in from_disk
reader(path / key)
File "pipes.pyx", line 599, in spacy.pipeline.pipes.Tagger.from_disk.load_model
File "pipes.pyx", line 512, in spacy.pipeline.pipes.Tagger.Model
File "/usr/local/lib/python2.7/dist-packages/spacy/_ml.py", line 511, in build_tagger_model
pretrained_vectors=pretrained_vectors,
File "/usr/local/lib/python2.7/dist-packages/spacy/_ml.py", line 361, in Tok2Vec
embed >> convolution ** conv_depth, pad=conv_depth
File "/usr/local/lib/python2.7/dist-packages/thinc/check.py", line 129, in checker
raise UndefinedOperatorError(op, instance, args[0], instance._operators)
UndefinedOperatorError:
[1;38;5;1mUndefined operator: >> [0m
Called by (<thinc.neural._classes.function_layer.FunctionLayer object at 0x7fed95b12350>, <thinc.neural._classes.feed_forward.FeedForward object at 0x7fed95bbb8d0>)
Available:
[1;38;5;4mTraceback: [0m
|__ [1m<lambda> [0m [782] in /usr/local/lib/python2.7/dist-packages/spacy/language.py
|____ [1mfrom_disk [0m [611] in /usr/local/lib/python2.7/dist-packages/spacy/util.py
|_____ [1mbuild_tagger_model [0m [511] in /usr/local/lib/python2.7/dist-packages/spacy/_ml.py
[38;5;1m >>> [0m pretrained_vectors=pretrained_vectors,
It's supposed to be thread-safe. What version are you running?
I think I see what the problem could be: creating the model touches some global state, so this might not be thread-safe? If so this should be fixed. Also, have you tried the same code in Python 3?
Spacy 2.1.3, model version: 2.1.0, python 2.7. Haven't tried on python 3.
I managed to workaround the problem by loading the model first on 1 thread:
spacy.load("en_core_web_sm")
And only after the model was loaded in 1 thread, access it on many threads.
If you can make it safer that also concurrent .load functions will work that would be great and safe.
@ndvbd Thanks for the report, I'll look into that.
This was indeed an annoying problem, which should now be fixed in Thinc v7.0.5. Inside the thinc/neural/_classes/model.py module, I've changed the operators dict to be in thread-local storage. I also added a test which failed previously, and which passed after my patch.
Still, there could be other problems I'm missing. Please do report if there are further problems. I already suspect we'll see a problem with models that use word vectors, as I think those touch some global state as well.
Great stuff.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.