If I try to sample with Metropolis in parallel (njobs>1), Theano throws an error, indicating that the numpy array is not aligned.
import numpy as np
import pymc3 as pm
import theano.tensor as tt
#create data
n=1000000
X = np.concatenate((np.ones((n, 1)), np.random.rand(n, 1)), axis = 1)
y = np.random.normal(loc = X.dot(np.array([5, -2])), scale = 1)
model = pm.Model()
with model:
beta = pm.Normal('beta', mu = 0, sd = 1e3, shape = X.shape[1])
mu = tt.dot(X, beta)
y_est = pm.Normal('y_est', mu = mu, observed = y)
step = pm.Metropolis()
trace = pm.sample(1, tune = 0, step = step, njobs = 2)
The same code runs fine, if njobs=1 or n=1000 instead of 1000000.
Process ForkPoolWorker-1:
Traceback (most recent call last):
File "../lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "../lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "../3.6/lib/python3.6/multiprocessing/pool.py", line 108, in worker
task = get()
File ".../lib/python3.6/site-packages/joblib/pool.py", line 362, in get
return recv()
File "../lib/python3.6/multiprocessing/connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
File ".../lib/python3.6/site-packages/theano/compile/function_module.py", line 1049, in _constructor_Function
f = maker.create(input_storage, trustme=True)
File ".../lib/python3.6/site-packages/theano/compile/function_module.py", line 1661, in create
input_storage=input_storage_lists, storage_map=storage_map)
File ".../lib/python3.6/site-packages/theano/gof/link.py", line 699, in make_thunk
storage_map=storage_map)[:3]
File ".../lib/python3.6/site-packages/theano/gof/vm.py", line 1047, in make_all
impl=impl))
File ".../lib/python3.6/site-packages/theano/gof/op.py", line 935, in make_thunk
no_recycling)
File "...lib/python3.6/site-packages/theano/gof/op.py", line 830, in make_c_thunk
e = FunctionGraph(node.inputs, node.outputs)
File ".../lib/python3.6/site-packages/theano/gof/fg.py", line 142, in __init__
inputs, outputs = graph.clone(inputs, outputs)
File ".../lib/python3.6/site-packages/theano/gof/graph.py", line 818, in clone
equiv = clone_get_equiv(i, o, copy_inputs)
File "...lib/python3.6/site-packages/theano/gof/graph.py", line 851, in clone_get_equiv
cpy = input.clone()
File ".../lib/python3.6/site-packages/theano/gof/graph.py", line 577, in clone
cp = self.__class__(self.type, self.data, self.name)
File ".../lib/python3.6/site-packages/theano/tensor/var.py", line 927, in __init__
Constant.__init__(self, type, data, name)
File ".../lib/python3.6/site-packages/theano/gof/graph.py", line 549, in __init__
self.data = type.filter(data)
File ".../lib/python3.6/site-packages/theano/tensor/type.py", line 189, in filter
"object dtype", data.dtype)
TypeError: ('The following error happened while compiling the node', CGemv{no_inplace}(TensorConstant{[ 2.174979...3341453 ]}, TensorConstant{-1.0}, TensorConstant{[[ 1. ..37875534]]}, Subtensor{int64:int64:}.0, TensorConstant{1.0}), '\n', 'The numpy.ndarray object is not aligned. Theano C code does not support that.', 'object buffer<memory at 0x126d3cdc8>', 'object shape', (1000000,), 'object strides', (8,), 'object dtype', dtype('float64'))
Process ForkPoolWorker-2:
Traceback (most recent call last):
...
After that message, the process seems dead and I terminate it manually.
The error occurs on 2 different computers with different OS.
The error also occurs for the nuts sampler. Maybe the error occurs because of the way joblib handles numpy arrays and feeds them to theano?
Oh, actually it is related to the size of your data. Maybe it doesn't fit into the memory? The problem disappears setting n=1000 for example.
The cutoff is at n=125000, all smaller values work. At exactly this point y is 1e6 bytes, which is the value at which joblib starts to use memory mapping to transfer arrays to the other processes. I don't think this is an accident. :-)
So it looks like this problem appears when an array is larger than a million bytes and the first theano op that uses this array requires an aligned array. I guess the easiest workaround would be to just disable the memory mapping in joblib, by passing mmap_mode=None to Parallel. But this would mean that we make copies of arbitrarily large arrays.
For me it is around n=65500... I need a better computer...
Not sure what is the best solution here, cc @aseyboldt @nouiz @ferrine
That's a bit strange, but I don't think this is about how good your computer is...
A short script to test the alignment:
import joblib
import numpy as np
def func(x):
print(x.flags)
print(type(x))
if __name__ == '__main__':
n = 1024 ** 2 // 8 + 1
x = np.random.randn(n)
jobs = joblib.Parallel(2)
print(x.nbytes)
res = jobs([joblib.delayed(func)(x), joblib.delayed(func)(x)])
Could you use this script to figure out at which point you get a numpy.core.memmap.memmap with ALIGNED: False on your machine?
This whole thing kind of sounds like a joblib bug to me. I don't see a good reason why the memmap shouldn't be aligned...
1048584
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : False
ALIGNED : False
UPDATEIFCOPY : False C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : False
ALIGNED : False
UPDATEIFCOPY : False
<class 'numpy.core.memmap.memmap'>
<class 'numpy.core.memmap.memmap'>
My output.
I reported this to joblib.
@lorentzenchr If you need this quickly, applying this patch should avoid the issue until this is fixed properly:
```patch
diff --git a/pymc3/sampling.py b/pymc3/sampling.py
index 338a054c..0fb9f678 100644
--- a/pymc3/sampling.py
+++ b/pymc3/sampling.py
@@ -546,11 +546,13 @@ def _mp_sample(**kwargs):
chains = list(range(chain, chain + njobs))
pbars = [kwargs.pop('progressbar')] + [False] * (njobs - 1)
@aseyboldt Thank you very much for the issue on joblib and your quick fix, in case I need it.
Link to issue by @aseyboldt in joblib: https://github.com/joblib/joblib/issues/563
Hope they resolve it quickly...
Also experiencing this issue. Analyzing multi-level model with a ton of production data (millions of data points?). Just letting you know someone else appreciates y'all looking at this! :)
Looks like the joblib devs have a heroic pull request open that got mired in compatibility issues between pickle protocols, in case anyone wants to help that out: https://github.com/joblib/joblib/pull/570
@AustinRochford should we close this?
Most helpful comment
I reported this to joblib.
@lorentzenchr If you need this quickly, applying this patch should avoid the issue until this is fixed properly:
```patch
diff --git a/pymc3/sampling.py b/pymc3/sampling.py
index 338a054c..0fb9f678 100644
--- a/pymc3/sampling.py
+++ b/pymc3/sampling.py
@@ -546,11 +546,13 @@ def _mp_sample(**kwargs):
return merge_traces(traces)
```