Spacy: Installing spaCy with OpenMP support on OS X

Created on 18 Feb 2016  路  18Comments  路  Source: explosion/spaCy

I have successfully built and installed spaCy 0.100.7 with OpenMP on OS X for brew installed python. For the impatient among us, here are instructions on how to do it yourself.

Choose your adventure:

  1. Install python:

    • brew install python

    • Other distributions such as Anaconda may work as well. Please let us know if you are successful!

  2. (choose one):

    • llvm-3.8

    • clang-omp

    • GCC (Warning: segfauts #266)

  3. Edit setup.py, then build and test spacy
  4. Install spacy

Note that these instructions are for brew installed python.

Option: LLVM-3.8

  1. Download the LLVM-3.8 binaries:
  2. Install the binaries:

    For example, to install at /opt/llvm38:

    tar xJf clang+llvm-3.8.0-x86_64-apple-darwin.tar.xz
    sudo mkdir -p /opt
    sudo mv clang+llvm-3.8.0-x86_64-apple-darwin /opt/llvm38
    
  3. Tell pip to use clang-3.8:

    export CC=/opt/llvm38/bin/clang
    export CXX=/opt/llvm38/bin/clang++
    export PATH=/opt/llvm38/bin:$PATH
    export C_INCLUDE_PATH=/opt/llvm38/include:$C_INCLUDE_PATH
    export CPLUS_INCLUDE_PATH=/opt/llvm38/include:$CPLUS_INCLUDE_PATH
    export LIBRARY_PATH=/opt/llvm38/lib:$LIBRARY_PATH
    export DYLD_LIBRARY_PATH=/opt/llvm38/lib:$DYLD_LIBRARY_PATH
    

    Option: Clang-OMP

  4. Install clang-omp Homebrew:

    brew install clang-omp
    
  5. Tell pip to use clang-omp:

    export CC=clang-omp
    export CXX=clang-omp
    export PATH=/usr/local/bin:$PATH
    export C_INCLUDE_PATH=/usr/local/include/libiomp:$C_INCLUDE_PATH
    export CPLUS_INCLUDE_PATH=/usr/local/include/libiomp:$CPLUS_INCLUDE_PATH
    export LIBRARY_PATH=/usr/local/lib:$LIBRARY_PATH
    export DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH
    

    Option: GCC

  6. Install GCC via Homebrew:

    brew install gcc --without-multilib
    

    The --without-multilib option is required for OpenMP support.

  7. Tell pip to use GCC:

    export CC=gcc-5
    export CXX=g++-5
    

WARNING: Compiling with GCC as of spaCy 0.100.5 may result in a segfault (#266).

Required: edit setup.py and install

Follow the 'Compile from source' instructions from spaCy documentation, with the following adjustments.

git clone https://github.com/honnibal/spaCy.git
cd spaCy
git checkout 0.100.6 # or 'master' if you wish

Edit setup.py lines 88-90 to enable OpenMP:

# if not sys.platform.startswith('darwin'):
compile_options['other'].append('-fopenmp')
link_options['other'].append('-fopenmp')

Now continue with the install instuctions as per the documentation.

virtualenv .env && source .env/bin/activate
export PYTHONPATH=`pwd`
pip install -r requirements.txt
python setup.py clean
pip install -e .
python -m spacy.en.download
pip install pytest
py.test spacy/tests/

To install spaCy outside of virtualenv and/or outside the source directory:

  1. Deactivate virtualenv using the deactivate command.
  2. Run pip install . in the source directory.
  3. For clang/llvm, add the appropriate library path:

    • clang-omp: export DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH

    • llvm38: export DYLD_LIBRARY_PATH=/opt/llvm38/lib:$DYLD_LIBRARY_PATH

Relevant resources:

Thank you for the help @honnibal @henningpeters @gushecht !

History:

  • May 9 2016: Add instructions for LLVM-3.8 and use deactivate virtualenv
    command.
  • use pip install -e . instead of python setup.py build_ext --inplace
  • add instructions for installing outside of virtualenv.
  • Update for 0.100.6
install osx

Most helpful comment

Hey all, looks like pip install -e . was the magic formula!

For final reference:

brew install python

brew install clang-omp

git clone https://github.com/honnibal/spaCy.git

cd spaCy
git checkout 0.100.5
# Make the specified changes to setup.py

export CC=clang-omp
export CXX=clang-omp
export PATH=/usr/local/bin:$PATH
export C_INCLUDE_PATH=/usr/local/include/libiomp:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=/usr/local/include/libiomp:$CPLUS_INCLUDE_PATH
export LIBRARY_PATH=/usr/local/lib:$LIBRARY_PATH
export DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH

virtualenv .env && source .env/bin/activate

export PYTHONPATH=`pwd`
pip install -r requirements.txt

python setup.py clean

pip install -e .

Thanks so much to each of you.

All 18 comments

Thanks for the pointers and making it work.

I am not a big fan of these silent up/downgrades depending on what dependencies you had installed. PIL comes to my mind as a particular bad example of this style. I would rather prefer passing some args/envs to setup.py and fail if dependencies are not available or ship a separate package. We'll definitely look into making this more accessible.

LLVM since version 3.7 added native OpenMP support, so the best solution may just be to wait until Apple releases the next version of XCode this summer (or thereabouts).

I can confirm clang-omp to work. We had another thread some time ago about compilation problems with gcc on OSX (#237). Its unfortunate, but there is very little support on detecting compilers with setuptools and even if you could do that reliably you might still be out of luck in case the user's Python got built with non-gcc flags (AFAIK they are not visible from within setup.py). What I want to say is: supporting gcc on osx is really tough while not being that much appreciated in case it does work.

Great news on clang-omp. I've been experimenting with many native extensions for Python, R, and NodeJS. I've had so many issues with native extensions and compilers that I now just install whichever compiler works best... Once Apple updates XCode with LLVM 3.7, there will be much less need for supporting GCC on OS X.

Practically speaking, the clang-omp solution works for me and I'm okay leaving it at that for now. I'm sure you guys have plenty of other interesting improvements to make to spaCy!

Hello again @mikepb

I got to python setup.py build_ext --inplace and then it failed. Screenshot attached. Many thanks, as always.

screen shot 2016-03-10 at 6 06 08 pm

Also, for what it's worth, I found that I had to downgrade to Python 2.7.9 from 2.7.11

I see you are using the Anaconda Python. I wrote the instructions for the homebrew Python. What is the compiler used to build Anaconda Python?

Just running the interpreter on my machine printed this message:

Python 2.7.11 (default, Feb 18 2016, 14:32:04) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

If you are using GGC-built Python, try using GCC to build spacy instead:

brew install gcc --without-multilib
export CC=gcc-5
export CXX=g++-5

I've been told that the gcc --without-multilib option is required for the OpenMP support used by spacy.

If you are successful, please let us know!

Remember to reset your build environment as well. Running python setup.py clean then using a new terminal session should do the trick.

Update: tried with GCC and successfully got the segfault :)

So then I tried with homebrew Python. Please note that I was getting an issue trying to clone the repo after running the export statements, so I revised the order a little:

brew install python
brew install clang-omp
git clone https://github.com/honnibal/spaCy.git
cd spaCy
git checkout 0.100.5
# Made the changes you specified to setup.py
export CC=clang-omp
export CXX=clang-omp
export PATH=/usr/local/bin:$PATH
export C_INCLUDE_PATH=/usr/local/include/libiomp:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=/usr/local/include/libiomp:$CPLUS_INCLUDE_PATH
export LIBRARY_PATH=/usr/local/lib:$LIBRARY_PATH
export DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH
virtualenv .env && source .env/bin/activate
export PYTHONPATH=`pwd`
pip install -r requirements.txt
python setup.py clean
python setup.py build_ext --inplace

And then....when downloading the English pack

(.env) Guss-MacBook-Pro:spaCy gushecht$ python -m spacy.en.download
/Users/gushecht/spaCy/.env/bin/python: dlopen(spacy/tokenizer.so, 2): Symbol not found: __ZTINSt8ios_base7failureE
  Referenced from: spacy/tokenizer.so
  Expected in: dynamic lookup

Any thoughts?

Perhaps at this point it makes more sense for me to spin up a Linux box on AWS and bypass the issue.

I actually downloaded the English pack using pip-installed spacy and symlinked the directory. I'm on a slow connection and neglected to test that command. Symlinking the data dir might work.

I've used the spot instances for cheap, but expect your instance to be killed during peak hours.

As an alternative to python -m spacy.en.download you can also run python -m sputnik --name spacy install en. But a symbol not found error looks like your installation didn't finish properly or is in a half-baked state. Can you run python setup.py clean and install again?

Btw: python setup.py build_ext --inplace is outdated. Please run pip install -e . for a development install.

Hey all, looks like pip install -e . was the magic formula!

For final reference:

brew install python

brew install clang-omp

git clone https://github.com/honnibal/spaCy.git

cd spaCy
git checkout 0.100.5
# Make the specified changes to setup.py

export CC=clang-omp
export CXX=clang-omp
export PATH=/usr/local/bin:$PATH
export C_INCLUDE_PATH=/usr/local/include/libiomp:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=/usr/local/include/libiomp:$CPLUS_INCLUDE_PATH
export LIBRARY_PATH=/usr/local/lib:$LIBRARY_PATH
export DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH

virtualenv .env && source .env/bin/activate

export PYTHONPATH=`pwd`
pip install -r requirements.txt

python setup.py clean

pip install -e .

Thanks so much to each of you.

Thanks for the update :+1: I've updated the issue description to use pip install -e .

Updated instructions with LLVM-3.8 option for native OpenMP support.

@henningpeters I managed to build the latest spacy master with LLVM-3.8 and OpenMP support as a binary wheel. The hack-patch for setup.py and the corresponding binary wheel is attached. If you have the time, please let me know if the wheel works!

spacy-macos-with-patch.zip

@mikepb : Took forever to circle back to this. Much appreciated.

I've integrated your patch into the setup.py here: https://github.com/explosion/spaCy/commit/36bcd46244f4167eca32b93e4fd43eccab6844bb

I'm running a bit blind on this, so hopefully I haven't done anything wrong. We're not fully supporting wheels at the moment, due to resource constraints.

@gushecht: Thanks a lot for your snippet. I've referenced this thread in the install docs for MacOS / OSX.

I'm closing this for now to signal that I'm not aware of further action here. Not 100% confident this is fully resolved yet, as I don't feel very across the issue. Please reopen if there's more to do.

As per today clang-omp is deprecated and moved to homebrew/boneyard/clang-omp. You should use brew install llvm instead.

Environment variables I used before pip installation:

export CC=/usr/local/opt/llvm/bin/clang
export CXX=/usr/local/opt/llvm/bin/clang++
export PATH=/usr/local/opt/llvm/bin:$PATH
export C_INCLUDE_PATH=/usr/local/opt/llvm/include:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=/usr/local/opt/llvm/include:$CPLUS_INCLUDE_PATH
export LIBRARY_PATH=/usr/local/opt/llvm/lib:$LIBRARY_PATH
export DYLD_LIBRARY_PATH=/usr/local/opt/llvm/lib:$DYLD_LIBRARY_PATH

Got about 2X speed up after this.

The patch seems to be "wrong" (maybe I'm missing something, therefore the quotes).

The PACKAGES.append line makes the installation fail with this error:

Obtaining file:///private/tmp/spaCy
    Complete output from command python setup.py egg_info:
    warning: ner.pyx:131:29: Not all members given for struct 'Transition'
    warning: ner.pyx:131:29: Not all members given for struct 'Transition'
    Processing attrs.pyx
    Processing cfile.pyx
    Processing gold.pyx
    Processing lexeme.pyx
    Processing matcher.pyx
    Processing morphology.pyx
    Processing orth.pyx
    Processing parts_of_speech.pyx
    Processing pipeline.pyx
    Processing strings.pyx
    Processing symbols.pyx
    Processing tagger.pyx
    Processing tokenizer.pyx
    Processing typedefs.pyx
    Processing vocab.pyx
    Processing bits.pyx
    Processing huffman.pyx
    Processing packer.pyx
    Processing _parse_features.pyx
    Processing _state.pyx
    Processing arc_eager.pyx
    Processing beam_parser.pyx
    Processing iterators.pyx
    Processing ner.pyx
    Processing nonproj.pyx
    Processing parser.pyx
    Processing stateclass.pyx
    Processing transition_system.pyx
    Processing doc.pyx
    Processing span.pyx
    Processing token.pyx
    Cythonizing sources
    running egg_info
    writing spacy.egg-info/PKG-INFO
    writing dependency_links to spacy.egg-info/dependency_links.txt
    writing requirements to spacy.egg-info/requires.txt
    writing top-level names to spacy.egg-info/top_level.txt
    error: package directory 'spacy/platform/darwin/lib' does not exist

Removing the line, instead, enables building and installation of the multithreading version of spaCy. I still find an issue when specifying -1 or a number larger than the number of cores on my system as the number of threads, but I can live with that and to me is way more important ensuring the multithreaded version is stable (as in "it builds like it should").

I saw that directory in the egg from @mikepb, but that's about it.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

TropComplique picture TropComplique  路  3Comments

tonywangcn picture tonywangcn  路  3Comments

peterroelants picture peterroelants  路  3Comments

prashant334 picture prashant334  路  3Comments

notnami picture notnami  路  3Comments