Spacy: Spacy nlp method running in multi thread mode by default

Created on 23 Jul 2018  路  7Comments  路  Source: explosion/spaCy

How to reproduce the behaviour

nlp = spacy.load('en_core_web_sm')
array_text = ['Hi there, this is a test' for i in range(0,10000)]
processed = nlp(', '.join(array_text))

This will cause 16 threads on my macbook pro to spin up for Spacy. So is Spacy running in multi-thread mode by default? How do I disable this behaviour? I was under the impression that it would only run in multi-thread mode if I used the nlp.pipe.

Your Environment

  • Operating System: Mac OS High Sierra
  • Python Version Used: 3.6
  • spaCy Version Used: 2.0.3
  • Environment Information:
docs

All 7 comments

It seems the problem is with BLAS. Once I change the OPENBLAS_NUM_THREADS var to 1, I don't see the same behaviour.

Hit the same issue, OPENBLAS_NUM_THREADS=1 fixes.

Might be worth flagging this in the docs? I'm using Spacy inside a Spark job, and was seeing a load factor of ~1000 on the workers, since all the cores were trying to fan out to all the other cores :)

Nice. Is that the case when using SpacyMagic or only with Spacy? Would also be great to be able to serialize the document otherwise we have to send back arrays of dictionaries representing each token. What sort of throughput do you get out of interest with spark?

@davidmcclure @eamonnmag Upcoming releases of spaCy are switching to single-thread by default, as the multi-core utilisation is pretty poor (numpy is just parallelising the matrix multiplications, which is too low a unit of work).

I agree that this should be in the docs in the meantime.

Great, thanks @honnibal

Thank you!

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings