Caffe: malloc error on macOS Sierra 10.12

Created on 29 Sep 2016  路  29Comments  路  Source: BVLC/caffe

I was able to compile and run an example (mnist), but when the example starts I see the following:

caffe(51829,0x7fffc23a93c0) malloc: *** malloc_zone_unregister() failed for 0x7fffc239f000

The example proceeds to run, but when I open up the python interpreter and import caffe, I see a similar error and the interepreter crashes:
import caffe
Python(14990,0x7fffc23a93c0) malloc: *** malloc_zone_unregister() failed for 0x7fffc239f000 /usr/local/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment. warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
[1] 14990 illegal hardware instruction python

Not sure what causes this?

Most helpful comment

the problem is related to leveldb. If leveldb is installed with brew, by default, it depends on the google's gperftools, which consists of a module named libtcmalloc.dylib. On macOS sierra, this causes the _malloc_zone_unregister_ problem.

therefore, there are two solutions:

  1. comment out the line USE_LEVELDB := 0 in Makefile.config
  2. download the leveldb(v1.9) and compile it to replace the original one.

All 29 comments

Same question for me.

Likewise. I tried building Caffe in CPU-only mode to no avail.

I also experienced similar errors at opengl branch on Sierra.

case 1)
Python 2.7.12 (default, Sep 30 2016, 14:23:15) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] on darwin Type "help", "copyright", "credits" or "license" for more information.
>>> import caffe
Python(87258,0x7fffe2b8b3c0) malloc: *** malloc_zone_unregister() failed for 0x7fffe2b81000
/Users/cepiross/Documents/srcs/caffe.git.viennacl.build/install/python/caffe/pycaffe.py:13: RuntimeWarning: to-Python converter for std::__1::vector<int, std::__1::allocator<int> > already registered; second conversion method ignored. from ._caffe import \
>>> quit()

case 2)
$ make pytest
[ 1%] Built target proto
[100%] Built target caffe
[100%] Built target pycaffe
Python(85579,0x7fffe2b8b3c0) malloc: *** malloc_zone_unregister() failed for 0x7fffe2b81000
caffe/pycaffe.py:13: RuntimeWarning: to-Python converter for std::__1::vector<int, std::__1::allocator<int> > already registered; second conversion method ignored. from ._caffe import \
.................WARNING: Logging before InitGoogleLogging() is written to STDERR
I0930 23:27:13.613234 3803755456 net.cpp:346] The NetState phase (1) differed from the phase (0) specified by a rule in layer train_data
I0930 23:27:13.613450 3803755456 net.cpp:385] The NetState did not contain stage 'val' specified by a rule in layer val_data
I0930 23:27:13.613469 3803755456 net.cpp:346] The NetState phase (1) differed from the phase (0) specified by a rule in layer loss
I0930 23:27:13.613474 3803755456 net.cpp:385] The NetState did not contain stage 'val' specified by a rule in layer loss
... continue to proceed successfully ...

After googling other cases, I suspect that the culprit would be related to malloc.
According to https://github.com/jemalloc/jemalloc/issues/420, Apple seems to change libsystem_malloc, which affects memory allocation in some way.

Since investigating causality, I have no idea where to start. But if anyone tries to fix this problem, please consider it at first.

Same problem. It happens at from ._caffe import Net; since that comes from C++ the issue might be somewhere between libsystem_malloc and boost, too. But from there I'm not getting any farther either.

Same problem too on macSerria 10.12.0 cpu only mode.

Python 2.7.12 (default, Oct  7 2016, 18:03:59)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import caffe
python(59689,0x7fffb83273c0) malloc: *** malloc_zone_unregister() failed for 0x7fffb831d000
[1]    59689 segmentation fault  python

same to me:
Python(4596,0x7fffa43723c0) malloc: *** malloc_zone_unregister() failed for 0x7fffa4368000

on macSerria 10.12, use python3.5

@igilitschenski

Yes, I think you're very right in saying the problem isn't exclusive to Sierra and it has to do with linking but I've tried the solution you posted and to no avail.

Multiple installations of Python are the root cause, I think. We're not linking to the correct libraries...

@iamlegolas I deleted my original post as it rather discusses the problem in #2677 which seem to be not the same.

Question: do any of you have multiple instances of Caffe running at once, ever? I was starting multiple processes using Python 3.5's multiprocessing 'spawn' mode, and running into an "illegal hardware instruction" crash on the second process. I changed it to 'fork' mode and made sure I was only ever importing Caffe after all forking was completed. (This was necessary with forking mode since Caffe reacts _badly_ to being forked.) I still have the malloc error showing up, but the crash is gone and it seems to work despite the displayed error.

The commit that fixed it: https://github.com/crowsonkb/style_transfer/commit/4fcc63d6cecbcb998c0bcc5063bf51fab0f4916d

I got it to work! I still get a malloc error but python doesn't crash and Caffe's now working.
How it goes:
screen shot 2016-10-14 at 12 04 14 pm

Here's what I did:
Step 1: Check your 'Makefile.config' file. You basically want to see if the following two entries are according to your paths:
PYTHON_INCLUDE :=
/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/include/python2.7 \
/usr/local/lib/python2.7/site-packages/numpy/core/include/numpy
PYTHON_LIB := /usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/
Correct these if there're any discrepancies.

Step 2: Go to your CAFFE_ROOT/Python folder. Run python from this directory using the terminal. Import pyplot using the command: import matplotlib.pyplot
This should take some time. Be patient. Once this is done, quit the Python shell. Run the shell again. Import Caffe and it should hopefully work!

(lldb) c
Process 40251 resuming
Python(40251,0x7fff9ec183c0) malloc: *** malloc_zone_unregister() failed for 0x7fff9ec0e000
Process 40251 stopped
* thread #1: tid = 0xa2467, 0x00007fff96132b44 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x00007fff96132b44 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23
libsystem_platform.dylib`_os_unfair_lock_recursive_abort:
->  0x7fff96132b44 <+23>: ud2

libsystem_platform.dylib`_os_ulock_wait:
    0x7fff96132b46 <+0>:  pushq  %rbp
    0x7fff96132b47 <+1>:  movq   %rsp, %rbp
    0x7fff96132b4a <+4>:  pushq  %r15

@iamlegolas Importing matplotlib does help! Does caffe work correctly after this malloc_zone_unregister() failed ? Does this not impact caffe in any way?
I didn't really understand how multiple python installations will cause this since we specifically give path of the python installation when compiling caffe.

@iamlegolas I could reproduce this. Interesting side note: When uninstalling python and installing current python from brew (2.7.12_2) the behaviour seems to change. Now I can load caffe module only if I don't quit python after importing matplotlib.pyplot. That is:

$ python
Python 2.7.12 (default, Oct 15 2016, 13:58:50)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import matplotlib.pyplot
/usr/local/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
>>> quit()
$ python
Python 2.7.12 (default, Oct 15 2016, 13:58:50)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import caffe
Python(17126,0x7fffb7a843c0) malloc: *** malloc_zone_unregister() failed for 0x7fffb7a7a000
/usr/local/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
Illegal instruction: 4

does not work whereas the following "works":

$ python
Python 2.7.12 (default, Oct 15 2016, 13:58:50)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import matplotlib.pyplot
/usr/local/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
>>> import caffe
Python(17154,0x7fffb7a843c0) malloc: *** malloc_zone_unregister() failed for 0x7fffb7a7a000
>>>

the problem is related to leveldb. If leveldb is installed with brew, by default, it depends on the google's gperftools, which consists of a module named libtcmalloc.dylib. On macOS sierra, this causes the _malloc_zone_unregister_ problem.

therefore, there are two solutions:

  1. comment out the line USE_LEVELDB := 0 in Makefile.config
  2. download the leveldb(v1.9) and compile it to replace the original one.

@dimilar Thank you so much! USE_LEVELDB := 0 fixed my crash issue.

The USE_LEVELDB := 0 is commented out by default in file Makefile.config.example. I can't find any Makefile.config. Should I create a file of that? So I still got the error message malloc_zone_unregister() failed. Is anyone still stuck here? Is there any other fix on this problem? I'm using python 2.7 installed by Homebrew on Sierra. And I've tried the matplotlib suggestion, which doesn't work. import caffe still cause python stopping.

same to me :(

edit: Uncomment USE_LEVELDB := 0 is right锛侊紒

Uncomment USE_LEVELDB := 0. There is an incompatibility with macOS Sierra and LevelDB and you want to disable it.

@crowsonkb Do I have to reinstall Caffe after commenting out USE_LEVELDB := 0?

@iamlegolas @EvanYellow @crowsonkb What is your version of CUDA, cuDNN, and XCode CLI tools? Please let me know, thanks!

Awesome! I'm curious how did you find this out? @dimilar

@RileyLee You need to do this : copy Makefile.config.example as Makefile.config. Then unmount that line in Makefile.config. Because Makefile specifies the Makefile.config file rather than Makefile.config.example

@dimilar Should I recompile gperftools as well?

@dimilar is right.
uncomment the line will work:
USE_LEVELDB := 0

then:

make clean
make all
make test
make pycaffe

Using the HEAD version of gperftools can also fix the problem.
brew reinstall gperftools --HEAD

@GlueCrow what is the difference between HEAD and non HEAD?

@clavichord93 HEAD version is just the newest version cloned from github repo.
And there is a commit which fixes the problem after gperftools 2.5 released.

When I try to run the "create_mnist.sh" file, I get the following error:
F0327 02:49:29.699080 3490468800 convert_mnist_data.cpp:144] This example requires LevelDB and LMDB; compile with USE_LEVELDB and USE_LMDB.

How do I cope with this on Sierra? :S @dimilar

From https://github.com/BVLC/caffe/blob/master/CONTRIBUTING.md:

_Please do not post usage, installation, or modeling questions, or other requests for help to Issues._
Use the caffe-users list instead. This helps developers maintain a clear, uncluttered, and efficient view of the state of Caffe.

Was this page helpful?
0 / 5 - 0 ratings