Spacy: Multiprocessing causes deadlock in Spacy v2 on OSX

Created on 14 Nov 2017  路  12Comments  路  Source: explosion/spaCy

Spacy v2 cannot use multiprocessing. If I load one time and use this in multiprocess, it will hang up. The following code will hang at parsed_txt_1st = spacy_nlp(txt_1st).

I know that we can overcome this by using spawn process with multiprocessing, but it'll load everything in child process and consume more memory and slow speed. Is there any other ways we can come over this? Notice: Spacy v1.x does not have this problem.

Sample Code:

import spacy
import multiprocessing

spacy_nlp = spacy.load('en')

def test(i):    
    txt_1st = "this is the first sentence of {}".format(i)
    txt_2nd = "this is the second sentence of {}".format(i)

    print(txt_1st)
    parsed_txt_1st = spacy_nlp(txt_1st)
    print(txt_2nd)
    parsed_txt_2nd = spacy_nlp(txt_2nd)
    return parsed_txt_1st, parsed_txt_2nd


pool = multiprocessing.Pool(processes=3)
data = range(3)
results = pool.map(test, data)
pool.close()
pool.join()

print(results)

My Environment

  • Operating System: MacOS, Darwin-17.2.0-x86_64-i386-64bit
  • Python Version Used: 3.6.0
  • spaCy Version Used: 2.0.2
  • Environment Information:
bug

Most helpful comment

@achillesliu Yes, the develop branch now uses a new version of Thinc that brings its own single-threaded matrix multiplication, by shipping optimised kernels from OpenBLAS.

However, we do still default to Accelerate on OSX --- so a little bit more needs to be done on that. If you have time for this, try compiling Thinc ( https://github.com/explosion/thinc ) on your machine, and then try editing the setup.py so that the OpenBLAS kernel compiles instead of linking the library to Accelerate.

Another alternative is to compile OpenBLAS yourself --- you should just need to check out https://github.com/xianyi/OpenBLAS, and edit the Makefile.rule to tell it not to use threads. Then you can either a) Checkout numpy, edit the site.cfg to point it to your OpenBLAS, and compile yourself a single-threaded numpy so you can use spaCy 2.0.10, or b) Run THINC_BLAS=/path/to/your/libopenblas.so pip install thinc==6.11.1.dev11

Either way, the goal is to get yourself a copy of spaCy that delegates its matrix multiplications away from your system's Accelerate library. On spaCy v2.0.10, this means compiling numpy, and linking it to OpenBLAS. On the upcoming v2.1.0, this means linking Thinc against single-threaded OpenBLAS. This will be easier in the release version.

It's been a really long process to get this fixed, because compiling the kernel on Windows has been a huge pain. But we're nearly there :tada:

All 12 comments

Do you have insight about why this might be happening?

My guess is that the deadlock happens when numpy creates threads in the child process. I have a fuzzy memory of the problem being explained to me elsewhere.

In #1508 I pointed out that even with n_threads=1 I see that all my cores fire up so I presume the underlying openblas library is using multiple (C) threads even if spacy is not. In the past I have noticed that multiprocessing is not compatible with openblas if the number of openblas threads is > 1, see e.g.:

https://github.com/numpy/numpy/issues/4813

The link above claims newer versions of openblas fix the issue but I feel I have had it more recently than 2014. Maybe try with OPENBLAS_NUM_THREADS=1.

Setting OPENBLAS_NUM_THREADS=1 does not solve the problem. The deadlock is still there.

I have efforts underway to avoid relying on any system libraries, to avoid this problem. The BLIS library offers similar single-thread speed to OpenBLAS, while being small enough to ship in a Python package: https://github.com/explosion/cython-blis

The plan is to keep the calls to linear algebra routines single-threaded, and multi-thread larger code-blocks ourselves. Currently the main problem is that Blis doesn't compile on Windows.

Got exactly the same problem today. Are there any updates here?

@achillesliu Yes, the develop branch now uses a new version of Thinc that brings its own single-threaded matrix multiplication, by shipping optimised kernels from OpenBLAS.

However, we do still default to Accelerate on OSX --- so a little bit more needs to be done on that. If you have time for this, try compiling Thinc ( https://github.com/explosion/thinc ) on your machine, and then try editing the setup.py so that the OpenBLAS kernel compiles instead of linking the library to Accelerate.

Another alternative is to compile OpenBLAS yourself --- you should just need to check out https://github.com/xianyi/OpenBLAS, and edit the Makefile.rule to tell it not to use threads. Then you can either a) Checkout numpy, edit the site.cfg to point it to your OpenBLAS, and compile yourself a single-threaded numpy so you can use spaCy 2.0.10, or b) Run THINC_BLAS=/path/to/your/libopenblas.so pip install thinc==6.11.1.dev11

Either way, the goal is to get yourself a copy of spaCy that delegates its matrix multiplications away from your system's Accelerate library. On spaCy v2.0.10, this means compiling numpy, and linking it to OpenBLAS. On the upcoming v2.1.0, this means linking Thinc against single-threaded OpenBLAS. This will be easier in the release version.

It's been a really long process to get this fixed, because compiling the kernel on Windows has been a huge pain. But we're nearly there :tada:

I am using Ubuntu 16.04, spacy 2.0.11 and having the same problem. Any update?

I am using OSX, spacy 1.9 and having the same problem. Is there any update?

I am using OSX, spacy 2.0.11 and got the same problem today. Are there any updates for this issue?

This is now fixed in spacy-nightly, as we no longer use numpy for matrix multiplications. Thanks for your patience on this! It was a very long road to get this resolved...

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings