Spacy: 馃挮 Try multi-processing in v2 nlp.pipe()?

Created on 6 Sep 2017  路  10Comments  路  Source: explosion/spaCy

In spaCy 1 multi-processing was a non-starter, for a variety of reasons. The model took a long time to load, and the integer ID mapping was stateful. These have been fixed in v2. At the same time, the v2 neural network model can't yet release the GIL, making multi-threading inefficient. We should therefore consider whether multi-processing would be a better solution.

The nlp.pipe() method is already a generator that takes a batch_size argument. I think it should be pretty easy to try out multi-processing here.

enhancement help wanted help wanted (easy) scaling

Most helpful comment

Hey, what's the status here? Is anyone working on this?

All 10 comments

@honnibal I am interested in working on the issue.

@souravsingh Great! Here's the method that would need to change:

https://github.com/explosion/spaCy/blob/develop/spacy/language.py#L433

I would suggest first working on getting the empty pipeline working (i.e. just the tokenizer). Then you can try the models.

The main complication you might encounter is that the v2 models use numpy, which multi-threads the matrix multiplications via OpenBlas. I'm not sure whether you'll have trouble with this in child processes. I also don't know whether the GPU will complain in child processes or not.

@honnibal Are we free to use joblib instead of multiprocessing?

@souravsingh Yes, I like joblib.

In case this is helpful, I've had success getting multiprocessing to work with spaCy by using the multiprocessing module from the pathos package as a drop in replacement for the standard library's multiprocessing module. In addition to other enhancements (I assume) it uses dillfor pickling.

Hey, what's the status here? Is anyone working on this?

Just out of curiosity, what is stopping you from releasing the GIL?
P.S.
Also weirdly enough pre-2.1.4 versions of spacy did appear to use multiple threads if you checked terminal. [2.1.4 does not yet without really affecting runtime it appears]

@souravsingh Great! Here's the method that would need to change:
https://github.com/explosion/spaCy/blob/develop/spacy/language.py#L433

@honnibal just wanna check that this was still the right method to be changing (since this link is from two years back). I'm interested in picking this up, since it seems like it hasn't been completed yet.

@teoh In case you're still thinking about this, have a look at #4371

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

peterroelants picture peterroelants  路  3Comments

bebelbop picture bebelbop  路  3Comments

enerrio picture enerrio  路  3Comments

muzaluisa picture muzaluisa  路  3Comments

nadachaabani1 picture nadachaabani1  路  3Comments