jax.scipy.linalg routines segfault on Mac OS X on scipy 1.2.1 or later but not scipy 1.1.0

Created on 22 Feb 2019  路  10Comments  路  Source: google/jax

I'm not exactly sure why this happens, being unfamiliar with the internal architecture, but on MacOS with Python 3.6.8, the following code segfaults if scipy 1.2.1 is installed (the version that comes by default when you pip install jax jaxlib):

import jax.random as random
import jax.scipy.linalg as linalg

key = random.PRNGKey(42)
# For some reason, matrices smaller than (50, 50) or so do not trigger segfaults
X = random.normal(key, (500, 500))
A = X @ X.T  # Drawn from standard Wishart distribution
linalg.cholesky(A)
print("Success!")

Output:

$ python -W ignore test.py      
zsh: bus error  python -W ignore test.py

If I roll back to Scipy 1.1.0, everything works:

$ python -W ignore test.py 
Success!

This is a great project by the way--thanks for working on it!

Edit: after further digging, I found the following in the the Scipy 1.2 release notes:

scipy.linalg.lapack now exposes the LAPACK routines using the Rectangular Full Packed storage (RFP) for upper triangular, lower triangular, symmetric, or Hermitian matrices; the upper trapezoidal fat matrix RZ decomposition routines are now available as well.

Perhaps this has something to do with it?

Even more edits: yet more digging has revealed scipy/scipy#9751, which hints that this might be caused by a specific (old) version of XCode. I will report back once XCode is upgraded.

bug

Most helpful comment

2927 should fix this bug; it requires a new jaxlib, so you can either build from source or we will most likely make a release next week.

All 10 comments

Thanks for reporting this, and for digging into it! (And for the kind words about JAX too!)

This smells like a scipy bug to me, but it's hard to be sure without tracking it down...

(I am increasingly leaning towards not using scipy's LAPACK kernels on CPU, because of issues like this one...)

Turns out upgrading XCode doesn't fix it. I agree that it seems like a scipy bug, perhaps it is because I'm running MacOS High Sierra still (at the request of department IT). In any case pinning the dependency to Scipy 1.1.0 works for now, and deployment on Linux is no problem.

Since I'm the only affected user and the bug isn't showing up on the CI system, it might be reasonable to drop this until other people report the same problem.

One more data point: I tried building scipy from source at version 1.2.1, and the self-built version doesn't segfault. I'm using Mac OS Mojave and XCode 10.0 and I installed OpenBLAS from homebrew.

So I'm wondering if this is a problem with the PyPI-provided scipy packages.

Another data point:
Segfault with jax.numpy.linalg.solve and scipy>1.1 from PyPI

The same happens for scipy 1.4.1 on macos and python 3.6.6 for the qr decomposition
for matrices where number of columns > 128

import jax.numpy as np
import jax
import numpy as onp
import jax.config as config
config.update("jax_enable_x64", True)


q, r = np.linalg.qr(onp.random.rand(2000, 128)) #works fine
q, r = np.linalg.qr(onp.random.rand(2000, 129)) #bus error: 10

update: the problem first seems to appear with scipy 1.2.0

I'm seeing a similar problem with Cholesky decomposition on Mac OS 10.15.4, Scipy 1.4.1 and Python 3.7.4 from Anaconda but where it works for a matrix 63 x 63 but gives a Bus error for 64 x 64.

I think I've figured out what's going wrong here, and why it's Mac OS specific.

The problem is that we run out of stack space and crash due to a stack overflow. Mac OS thread stacks default to 512KiB, whereas Linux defaults to 8MiB stacks. (Since these threads are part of a thread pool, you cannot work around this by changing ulimit, it requires code changes.)

I'm not quite sure what the best way to fix this is at the moment but I'll figure something out.

Thanks for all the hard work! Much appreciated

2927 should fix this bug; it requires a new jaxlib, so you can either build from source or we will most likely make a release next week.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

RobertTLange picture RobertTLange  路  3Comments

murphyk picture murphyk  路  3Comments

lonelykid picture lonelykid  路  3Comments

shannon63 picture shannon63  路  3Comments

sursu picture sursu  路  3Comments