pymc3 🚀 - ADVI, NUTS and Metropolis produce significantly different results

This is entirely possible. Metropolis can mix really poorly sometimes, and ADVI can give really bad approximations. You would have a better idea if something is wrong or not if you looked at some convergence diagnostics. If the ELBO in ADVI has not converged to a stationary value and the MCMC samplers have not converged, then it is premature to compare them.

Also, in general, you should not compare NUTS and Metropolis based on sampling speed. NUTS is vastly more efficient in terms of effective sample size in the resulting trace than Metropolis is.

BTW, you appear to be overwriting your data (y) with the likelihood itself (also y) in the last line of the model. You should probably avoid that.

fonnesbeck on 7 Jun 2016

👍2

The ELBO in ADVI stabilizes very quickly:

%time means, sds, elbos = pm.variational.advi( \
    model=mixedEffect_model, n=5000, learning_rate=1e-1)


Iteration 0 [0%]: ELBO = -55115425.64
Iteration 500 [10%]: ELBO = -20841.0
Iteration 1000 [20%]: ELBO = -19739.33
Iteration 1500 [30%]: ELBO = -28763.83
Iteration 2000 [40%]: ELBO = -55537.09
Iteration 2500 [50%]: ELBO = -20212.82
Iteration 3000 [60%]: ELBO = -20057.14
Iteration 3500 [70%]: ELBO = -20062.19
Iteration 4000 [80%]: ELBO = -20067.08
Iteration 4500 [90%]: ELBO = -20674.95
Finished [100%]: ELBO = -20232.54
CPU times: user 44min 19s, sys: 47.7 s, total: 45min 7s
Wall time: 1min 18s

Metropolis seems to suffer from bad initialization. h2 stays at zero and does not move at all:

with mixedEffect_model:
    start = pm.find_MAP()
    step = pm.Metropolis()
    trace_mp = pm.sample(5000, step, start=start)

Is there anyway to use the mean of VB to initialize the metropolis?

kayhan-batmanghelich on 7 Jun 2016

What do Metropolis and NUTS do when you do not initialize it with the MAP? Any better? I don't usually have much success with using it to initialize. If you _are_ going to use it, you might pass scaling=start to NUTS as well.

You may also get better mileage out of specifying sigma directly rather than tau, unless you are truly interested in the precision. See what you get out of a HalfCauchy(5) on sigma.

fonnesbeck on 7 Jun 2016

Not using start didn't improve the results for Metropolis. Meanwhile I realized that I had intercept in the X, so I removed beta_0:

This is a new code (I followed your advice ):

import numpy as np, pymc3 as pm, theano.tensor as T


mixedEffect_model = pm.Model()

with pm.Model() as mixedEffect_model:
    ### hyperpriors
    sigma2 = pm.InverseGamma('sigma2', 1.e-3, 1.e-3)
    #tau = pm.Gamma('tau', 1.e-3, 1.e-3)
    #sigma2 = 1.0/tau
    h2 = pm.Uniform('h2')
    #beta_0 = pm.Uniform('beta_0', lower=-1000, upper=1000)   # a replacement for improper prior
    w = pm.Uniform('w', lower=-1000, upper=1000, shape=(M,1))   # a replacement for improper prior

    z = pm.Normal('z', mu=0, sd= (h2*sigma2)**0.5 , shape=(N,1))
    g = T.dot(L,z)

    #y = pm.Normal('y', mu = beta_0+g + T.dot(X,w), sd= ((1-h2)*sigma2)**0.5 , shape=(N,1) , observed=Pheno )
    y = pm.Normal('y', mu = g + T.dot(X,w), sd= ((1-h2)*sigma2)**0.5 , shape=(N,1) , observed=Pheno )

NUTs is on the right path (the value matches ML estimation) but judging based on the traceplot, it seems that it is not quite converged yet. Perhaps more iteration helps but it is quite slow (5000 iteration takes more an hour):

with mixedEffect_model:
    start = pm.find_MAP()
    step = pm.NUTS()
    trace_nuts = pm.sample(5000, step, start=start)


figure()
_ = hist(trace_nuts['h2'],100)

pm.traceplot(trace_nuts, varnames=['h2'])

Metropolis still does not move:

with mixedEffect_model:
    #start = pm.find_MAP()
    step = pm.Metropolis()
    #trace_mp = pm.sample(5000, step, start=start)
    trace_mp = pm.sample(5000, step)

figure()
_ = hist(trace_mp['h2'],100)

pm.traceplot(trace_mp, varnames=['h2'])

ADVI is pretty off too:

v_params = pm.variational.advi( \
    model=mixedEffect_model, n=5000, learning_rate=1e-1)

with mixedEffect_model:
    trace = pm.variational.sample_vp(v_params, draws=5000)

_ = hist(trace['h2'],100)
print mean(trace['h2'])

kayhan-batmanghelich on 8 Jun 2016

@kayhan-batmanghelich
I am also currently experimenting with different ways to fit a mixed-effect model in PyMC3. In my case, I found that NUTS is very sensitive to starting value, and using the value return from find_MAP() doesn't seem to work. Moreover, for my model I need around 50000 samples and the trace start to convert around 20000 samples in.
You might find https://github.com/paul-buerkner/brms helpful, it's a R package for mixed-effect model using STAN. One of the tricks it used is to center the design matrix X, remove the intercept from X, and model the population-level intercept as an additional term:
y = pm.Normal('y', mu = intercept + g + T.dot(X_centered,w), sd= ((1-h2)*sigma2)**0.5 , shape=(N,1) , observed=Pheno )

junpenglao on 27 Jun 2016

@junpenglao
Thank you for your note. I will try that idea (removing the intercept and adding independently). Theoretically, the results should be the same so I will be surprised if it is different. I will also try replacing sigma2 ~ inv-gamma with something like sigma ~ half-cauchy as suggest by Andrew Gelman (what is half cauchy in PyMC3 by the way?).

I noticed that if I run NUTS for about 20,000 iterations, the posterior value is close the maximum likelihood estimation but that takes very very long. I couldn't get ADVI to produce a reasonable result. Metropolis also does not produces a good results; I tried initializing it with ADVI values, increasing iteration, no success.

Overall it is very surprising. This is a basic mixed effect model perhaps a bit big but really nothing special about it. My final goal is not to solve this specific mixed effect model but rather using it as stepping stone to train more complicated model. I am very interested to get PyMC3 working since it helps me to focus more on the model rather than inference algorithm.

kayhan-batmanghelich on 28 Jun 2016

@kayhan-batmanghelich

Theoretically, the results should be the same so I will be surprised if it is different.

Actually, it makes a difference at least in the way brm models it, and in PyMC the model estimation is much closer to a classical mixed model as return from statsmodels. However, I still having difficulty to hack it to produce the same result. On the other hand the brm estimation is nearly identical to lme4...

And for me, only Metropolis works but not NUTS and ADVI...

My final goal is not to solve this specific mixed effect model but rather using it as stepping stone to train more complicated model. I am very interested to get PyMC3 working since it helps me to focus more on the model rather than inference algorithm.

I am trying to do the same thing - if you want we can share the codes and try to solve this together.

junpenglao on 28 Jun 2016

@junpenglao
I would be happy to discuss it, shoot me email and we can skype.

I found STAN very slow for this problem. This is a basic model, there are not much to change in the model. This is my model:

model

Other than numerical issue, I don't have any other explanation why it doesn't work.

kayhan-batmanghelich on 29 Jun 2016

How did STAN's result compare?

datnamer on 29 Jun 2016

@datnamer For STAN, I tried both NUTS and ADVI. NUTS stopped at the warmup iteration (iteration 1) for 8 hours, so I stopped it. Full-rank ADVI is relatively fast (couple of hours) but does not converge to a correct answer.

kayhan-batmanghelich on 29 Jun 2016

👍1

@kayhan-batmanghelich
Your model actually runs fine using some generated data, I tried with the following code and it works with both Metropolis and NUTS:

import numpy as np, pymc3 as pm, theano.tensor as T, matplotlib.pyplot as plt

M    = 6  # number of columns in X - fixed effect
N    = 10 # number of columns in L - random effect
nobs = 10

# generate design matrix using patsy
from patsy import dmatrices
import pandas as pd
predictors = []
for s1 in range(N):
    for c1 in range(2):
        for c2 in range(3):
            for i in range(nobs):
                predictors.append(np.asarray([c1+1,c2+1,s1+1]))
tbltest             = pd.DataFrame(predictors, columns=['Condi1', 'Condi2', 'subj'])
tbltest['Condi1']   = tbltest['Condi1'].astype('category')
tbltest['Condi2']   = tbltest['Condi2'].astype('category')
tbltest['subj']     = tbltest['subj'].astype('category')
tbltest['tempresp'] = np.random.normal(size=(nobs*M*N,1))

Y, X    = dmatrices("tempresp ~ Condi1*Condi2", data=tbltest, return_type='matrix')
Terms   = X.design_info.column_names
_, L    = dmatrices('tempresp ~ -1+subj', data=tbltest, return_type='matrix')
X       = np.asarray(X) # fixed effect
L       = np.asarray(L) # mixed effect
Y       = np.asarray(Y) 
# generate data
w0 = [5,1,2,3,1,1]
z0 = np.random.normal(size=(N,))
Pheno   = np.dot(X,w0) + np.dot(L,z0) + Y.flatten()
#%%
with pm.Model() as mixedEffect_model:
    ### hyperpriors
    h2     = pm.Uniform('h2')
    sigma2 = pm.HalfCauchy('eps', 5)
    #beta_0 = pm.Uniform('beta_0', lower=-1000, upper=1000)   # a replacement for improper prior
    w = pm.Normal('w', mu = 0, sd = 100, shape=M)
    z = pm.Normal('z', mu = 0, sd= (h2*sigma2)**0.5 , shape=N)
    g = T.dot(L,z)
    y = pm.Normal('y', mu = g + T.dot(X,w), 
                  sd= ((1-h2)*sigma2)**0.5 , observed=Pheno )

with mixedEffect_model:
    # means, sds, elbos = pm.variational.advi(n=10000)
    # trace = pm.sample(50000,step=pm.Metropolis())
    trace = pm.sample(3000)

I think you model might be ill-condition for your data, you should try to reparameterize the model or perform some preprocessing.
In any case, @kayhan-batmanghelich i will send you an email.

junpenglao on 29 Jun 2016

Glad we are atleast keeping up with stan.

datnamer on 29 Jun 2016

I would be interested how atmcmc performs with your model. As it is the perfect case with the multiply peaked distribution. Would you be interested to try it?
You would need to add a likelihood variable to your model that contains the model likelihood and pass it to the input. Also you would need an output folder where to store the traces. You can follow the example script ATMIP_2gaussians in the example folder in how to call it:

import pymc3 as pm
import numpy as np
from pymc3.step_methods import ATMCMC as atmcmc
import theano.tensor as tt
from matplotlib import pylab as plt

test_folder = ('ATMIP_TEST')

n_chains = 500
n_steps = 100
tune_interval = 25
njobs = 1

n = 4

mu1 = np.ones(n) * (1. / 2)
mu2 = -mu1

stdev = 0.1
sigma = np.power(stdev, 2) * np.eye(n)
isigma = np.linalg.inv(sigma)
dsigma = np.linalg.det(sigma)

w1 = stdev
w2 = (1 - stdev)


def two_gaussians(x):
    log_like1 = - 0.5 * n * tt.log(2 * np.pi) \
                - 0.5 * tt.log(dsigma) \
                - 0.5 * (x - mu1).tt.dot(isigma).dot(x - mu1)
    log_like2 = - 0.5 * n * tt.log(2 * np.pi) \
                - 0.5 * tt.log(dsigma) \
                - 0.5 * (x - mu2).T.dot(isigma).dot(x - mu2)
    return tt.log(w1 * tt.exp(log_like1) + w2 * tt.exp(log_like2))

with pm.Model() as ATMIP_test:
    X = pm.Uniform('X',
                   shape=n,
                   lower=-2. * np.ones_like(mu1),
                   upper=2. * np.ones_like(mu1),
                   testval=-1. * np.ones_like(mu1),
                   transform=None)
    like = pm.Deterministic('like', two_gaussians(X))
    llk = pm.Potential('like', like)

with ATMIP_test:
    step = atmcmc.ATMCMC(n_chains=n_chains, tune_interval=tune_interval,
                         likelihood_name=ATMIP_test.deterministics[0].name)

trcs = atmcmc.ATMIP_sample(
                        n_steps=n_steps,
                        step=step,
                        njobs=njobs,
                        progressbar=True,
                        trace=test_folder,
                        model=ATMIP_test)

pm.summary(trcs)
Pltr = pm.traceplot(trcs, combined=True)
plt.show(Pltr[0][0])

hvasbath on 7 Jul 2016

@hvasbath Sure I will give it a go. However I think there is some mistakes in the ATMIP_2gaussians:

def two_gaussians(x):
    log_like1 = - 0.5 * n * tt.log(2 * np.pi) \
                - 0.5 * tt.log(dsigma) \
                - 0.5 * (x - mu1).tt.dot(isigma).dot(x - mu1)
    log_like2 = - 0.5 * n * tt.log(2 * np.pi) \
                - 0.5 * tt.log(dsigma) \
                - 0.5 * (x - mu2).T.dot(isigma).dot(x - mu2)
    return tt.log(w1 * tt.exp(log_like1) + w2 * tt.exp(log_like2))

As 0.5 * (x - mu1).tt.dot(isigma).dot(x - mu1) is not allowed by Theano

junpenglao on 7 Jul 2016

That must have accidentally got in there when they changed automatically the way to import theano.tensor from T to tt. So originally it was only a capital T.
Anyway that is part of the model which you can ignore. The part on how to call the sample and produce the step is a little different than the standard thats why I referred you to that. Not because of the model.

hvasbath on 7 Jul 2016

@hvasbath I see. But I "...would need to add a likelihood variable to your model that contains the model likelihood and pass it to the input." Is the likelihood variable being pass to the following need to be the one with observedRV?

with ATMIP_test:
    step = atmcmc.ATMCMC(n_chains=n_chains, tune_interval=tune_interval,
                         likelihood_name=ATMIP_test.deterministics[0].name)

or can I also do llk = pm.Potential('like', like) ?

junpenglao on 7 Jul 2016

It needs to be a deterministic variable. A potential is not being traced in the traces. Thats why I have the deterministic variable up there. For your model it would be fine if I understand it correctly, to put 'y' in the likelihood_name. But as you have Normal priors you cannot ignore them. So creating a deterministic variable like this:

with pm.Model() as mixedEffect_model:
    ### hyperpriors
    h2     = pm.Uniform('h2', transform=None)
    sigma2 = pm.HalfCauchy('eps', 5, transform=None)
    #beta_0 = pm.Uniform('beta_0', lower=-1000, upper=1000)   # a replacement for improper prior
    w = pm.Normal('w', mu = 0, sd = 100, shape=M)
    z = pm.Normal('z', mu = 0, sd= (h2*sigma2)**0.5 , shape=N)
    g = T.dot(L,z)
    y = pm.Normal('y', mu = g + T.dot(X,w), 
                  sd= ((1-h2)*sigma2)**0.5 , observed=Pheno )
    like = pm.Deterministic('like', h2.logpt + sigma2.logpt + w.logpt + z.logpt + y.logpt)

Transforms have to be switched off, otherwise it wont work, but anyways for Metropolis sampling I couldnt see a major improvement using them so far.

Then you can call it like this:

with mixedEffect_model:
    step=ATMCMC.ATMCMC(n_chains=500, tune_interval=20, likelihood_name=mixedEffect_model.deterministics[0].name)

trace = ATMCMC.ATMIP_sample(n_steps=200, step=step, njobs=1, progressbar=True, model=mixedEffect_model, trace='Test')

It will save all the traces in the current directory 'Test' subdirectory.
I just started it on my computer and will keep you updated.

hvasbath on 8 Jul 2016

It took 60 stages to finish. 30 seconds one stage makes it half an hour on one core you can try it with several cores and more chains if the result is bad. I am posting here the result of the 30s stage
stage_30
and the final stage:
stage_final
You should be able to tell what the parameters mean.
You can load the respective stage with:

trcs = pm.backends.text.load('Test/stage_30',model=mixedEffect_model)
trcs = pm.backends.text.load('Test/stage_final',model=mixedEffect_model)

and plot it with:

Pltr = pm.traceplot(trcs, combined=True)
plt.show(Pltr[0][0])

hvasbath on 8 Jul 2016

Let me know what you think!
Of course this was a first run and you can tune the algorithm mostly by changing the n_chains, n_steps and the tune_interval.
Most effectively is increasing n_chains= here I took 500 with each 200 steps. These are basically Metropolis chains with MultivariateNormal proposal covariance from all the traces.
Cheers!

hvasbath on 8 Jul 2016

Thanks a lot @hvasbath !!! I will try right away.

It needs to be a deterministic variable. A potential is not being traced in the traces. Thats why I have the deterministic variable up there.

For me this was the hard part to understand - if you could write it down somewhere as documentation that would be great!

junpenglao on 8 Jul 2016

Actually there is a description in the class definition of the ATMCMC object. But I will add an example to the description. Also a flag for deleting the intermediate stages will be in the updated version of the code. These can fill the harddisk quite fast- but I usually keep all of the stages in order to see how it samples.
Thanks for your response!

hvasbath on 8 Jul 2016

@hvasbath very cool, thanks for chiming in. this might be a good real-world example of ATMCMC usage.

twiecki on 8 Jul 2016

@hvasbath

Of course this was a first run and you can tune the algorithm mostly by changing the n_chains, n_steps and the tune_interval.

Could you provide some more information on how to choose the optimal parameters for the sampler? If I run it as it is, atmcmc return very similar estimation to MCMC and NUTS except w[0] (you can also see it in the trace plot above: its posterior distribution is much wider)

junpenglao on 8 Jul 2016

@hvasbath , CC: @junpenglao , @twiecki

Thanks @hvasbath for your suggestion. I am trying ATMCMC on my model. My first try failed and I got this error:

('Beta: 0', ' Stage: 0')
Sample initial stage: ...
 [-----------------100%-----------------] 501 of 500 complete in 142.5 sec
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-09c8faa34f05> in <module>()
     27                             progressbar=True,
     28                             model=mixedEffect_model,
---> 29                             trace=test_folder)
     30 

/home/batmanghelich/anaconda2/lib/python2.7/site-packages/pymc3/step_methods/ATMCMC.pyc in ATMIP_sample(n_steps, step, start, trace, chain, stage, njobs, tune, progressbar, model, random_seed)
    429                                             step.select_end_points(mtrace)
    430                     step.beta, step.old_beta, step.weights = step.calc_beta()
--> 431                     step.covariance = step.calc_covariance()
    432                     step.res_indx = step.resample()
    433                     step.stage += 1

/home/batmanghelich/anaconda2/lib/python2.7/site-packages/pymc3/step_methods/ATMCMC.pyc in calc_covariance(self)
    208         Ndarray of weighted covariances (NumPy > 1.10. required)
    209         """
--> 210         return np.cov(self.array_population, aweights=self.weights, rowvar=0)
    211 
    212     def select_end_points(self, mtrace):

TypeError: cov() got an unexpected keyword argument 'aweights'

It seems that it is numpy issue. Not sure I can install that version of numpy for python 2.7 so I switch to 3.5. @twiecki is numpy requirement enforced during installation?

Thanks,

kayhan-batmanghelich on 8 Jul 2016

You probably can just update numpy to 1.11 and stick with 2.7. However, in general I recommend updating to 3.5.

twiecki on 8 Jul 2016

Apparently we don't enforce the right version, good point.

twiecki on 8 Jul 2016

@kayhan-batmanghelich I am using python2.7 with the latest numpy version that is no problem. And yes you need a newer numpy version!
@junpenglao As you can see from the early stage result post it indeed sampled the whole space once, but then it converges towards the highest likelihood values.
What values do you expect for h etc? If all the samplers converge there- maybe something in your model is wrong?
Basically increasing the number of metropolis chains makes initially the most sence. If your search space is very large- which it is maybe you use like n_chains 2000? n_steps=200 and tune_interval=20. A description of the algorithm is given in its class and the sampling function doc. I wont be able to answer another time befor monday.

hvasbath on 8 Jul 2016

@twiecki you are right if we get this to run properly and the results are like expected maybe we can use this example for the docs?

hvasbath on 8 Jul 2016

@hvasbath definitely.

twiecki on 8 Jul 2016

I installed numpy 1.1 and it ran for a few hours and didn't finish because I got disconnected from the server. In the second try, I used more cores and I got this error. I will re-run with less cores again:

('Beta: 0', ' Stage: 0')
Sample initial stage: ...
 [-----------------100%-----------------] 501 of 500 complete in 141.7 sec
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/numpy/lib/function_base.py:2487: RuntimeWarning: Degrees of freedom <= 0 for slice
  warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
('Beta: 9.53674316406e-07', ' Stage: 1')
Initialising chain traces ...
Sampling ...

Joblib's exception reporting continues...

---------------------------------------------------------------------------
JoblibValueError                          Traceback (most recent call last)
<ipython-input-10-26007b72a78e> in <module>()
     27                             progressbar=True,
     28                             model=mixedEffect_model,
---> 29                             trace=test_folder)
     30 

/home/batmanghelich/anaconda2/lib/python2.7/site-packages/pymc3/step_methods/ATMCMC.pyc in ATMIP_sample(***failed resolving arguments***)
    446                             'progressbar': progressbar,
    447                             'model': model}
--> 448                     mtrace = _iter_parallel_chains(parallel, **sample_args)
    449 
    450                     step.population, step.array_population, step.likelihoods = \

/home/batmanghelich/anaconda2/lib/python2.7/site-packages/pymc3/step_methods/ATMCMC.pyc in _iter_parallel_chains(parallel, **kwargs)
    551                         trace=trace_list[chain],
    552                         start=step.population[step.res_indx[chain]],
--> 553                         **kwargs) for chain in chains)
    554     return pm.sampling.merge_traces(traces)
    555 

/home/batmanghelich/anaconda2/lib/python2.7/site-packages/joblib/parallel.pyc in __call__(self, iterable)
    808                 # consumption.
    809                 self._iterating = False
--> 810             self.retrieve()
    811             # Make sure that we get a last message telling us we are done
    812             elapsed_time = time.time() - self._start_time

/home/batmanghelich/anaconda2/lib/python2.7/site-packages/joblib/parallel.pyc in retrieve(self)
    755                     # a working pool as they expect.
    756                     self._initialize_pool()
--> 757                 raise exception
    758 
    759     def __call__(self, iterable):

JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/runpy.py in _run_module_as_main(mod_name='ipykernel.__main__', alter_argv=1)
    169     pkg_name = mod_name.rpartition('.')[0]
    170     main_globals = sys.modules["__main__"].__dict__
    171     if alter_argv:
    172         sys.argv[0] = fname
    173     return _run_code(code, main_globals, None,
--> 174                      "__main__", fname, loader, pkg_name)
        fname = '/home/batmanghelich/anaconda2/lib/python2.7/site-packages/ipykernel/__main__.py'
        loader = <pkgutil.ImpLoader instance>
        pkg_name = 'ipykernel'
    175 
    176 def run_module(mod_name, init_globals=None,
    177                run_name=None, alter_sys=False):
    178     """Execute a module's code without importing it

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/runpy.py in _run_code(code=<code object <module> at 0x2aaaac07b0b0, file "/...2.7/site-packages/ipykernel/__main__.py", line 1>, run_globals={'__builtins__': <module '__builtin__' (built-in)>, '__doc__': None, '__file__': '/home/batmanghelich/anaconda2/lib/python2.7/site-packages/ipykernel/__main__.py', '__loader__': <pkgutil.ImpLoader instance>, '__name__': '__main__', '__package__': 'ipykernel', 'app': <module 'ipykernel.kernelapp' from '/home/batman...python2.7/site-packages/ipykernel/kernelapp.pyc'>}, init_globals=None, mod_name='__main__', mod_fname='/home/batmanghelich/anaconda2/lib/python2.7/site-packages/ipykernel/__main__.py', mod_loader=<pkgutil.ImpLoader instance>, pkg_name='ipykernel')
     67         run_globals.update(init_globals)
     68     run_globals.update(__name__ = mod_name,
     69                        __file__ = mod_fname,
     70                        __loader__ = mod_loader,
     71                        __package__ = pkg_name)
---> 72     exec code in run_globals
        code = <code object <module> at 0x2aaaac07b0b0, file "/...2.7/site-packages/ipykernel/__main__.py", line 1>
        run_globals = {'__builtins__': <module '__builtin__' (built-in)>, '__doc__': None, '__file__': '/home/batmanghelich/anaconda2/lib/python2.7/site-packages/ipykernel/__main__.py', '__loader__': <pkgutil.ImpLoader instance>, '__name__': '__main__', '__package__': 'ipykernel', 'app': <module 'ipykernel.kernelapp' from '/home/batman...python2.7/site-packages/ipykernel/kernelapp.pyc'>}
     73     return run_globals
     74 
     75 def _run_module_code(code, init_globals=None,
     76                     mod_name=None, mod_fname=None,

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/ipykernel/__main__.py in <module>()
      1 
      2 
----> 3 
      4 if __name__ == '__main__':
      5     from ipykernel import kernelapp as app
      6     app.launch_new_instance()
      7 
      8 
      9 
     10 

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/traitlets/config/application.py in launch_instance(cls=<class 'ipykernel.kernelapp.IPKernelApp'>, argv=None, **kwargs={})
    591         
    592         If a global instance already exists, this reinitializes and starts it
    593         """
    594         app = cls.instance(**kwargs)
    595         app.initialize(argv)
--> 596         app.start()
        app.start = <bound method IPKernelApp.start of <ipykernel.kernelapp.IPKernelApp object>>
    597 
    598 #-----------------------------------------------------------------------------
    599 # utility functions, for convenience
    600 #-----------------------------------------------------------------------------

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/ipykernel/kernelapp.py in start(self=<ipykernel.kernelapp.IPKernelApp object>)
    437         
    438         if self.poller is not None:
    439             self.poller.start()
    440         self.kernel.start()
    441         try:
--> 442             ioloop.IOLoop.instance().start()
    443         except KeyboardInterrupt:
    444             pass
    445 
    446 launch_new_instance = IPKernelApp.launch_instance

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/zmq/eventloop/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    157             PollIOLoop.configure(ZMQIOLoop)
    158         return PollIOLoop.current(*args, **kwargs)
    159     
    160     def start(self):
    161         try:
--> 162             super(ZMQIOLoop, self).start()
        self.start = <bound method ZMQIOLoop.start of <zmq.eventloop.ioloop.ZMQIOLoop object>>
    163         except ZMQError as e:
    164             if e.errno == ETERM:
    165                 # quietly return on ETERM
    166                 pass

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/tornado/ioloop.py in start(self=<zmq.eventloop.ioloop.ZMQIOLoop object>)
    878                 self._events.update(event_pairs)
    879                 while self._events:
    880                     fd, events = self._events.popitem()
    881                     try:
    882                         fd_obj, handler_func = self._handlers[fd]
--> 883                         handler_func(fd_obj, events)
        handler_func = <function null_wrapper>
        fd_obj = <zmq.sugar.socket.Socket object>
        events = 1
    884                     except (OSError, IOError) as e:
    885                         if errno_from_exception(e) == errno.EPIPE:
    886                             # Happens when the client closes the connection
    887                             pass

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py in null_wrapper(*args=(<zmq.sugar.socket.Socket object>, 1), **kwargs={})
    270         # Fast path when there are no active contexts.
    271         def null_wrapper(*args, **kwargs):
    272             try:
    273                 current_state = _state.contexts
    274                 _state.contexts = cap_contexts[0]
--> 275                 return fn(*args, **kwargs)
        args = (<zmq.sugar.socket.Socket object>, 1)
        kwargs = {}
    276             finally:
    277                 _state.contexts = current_state
    278         null_wrapper._wrapped = True
    279         return null_wrapper

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py in _handle_events(self=<zmq.eventloop.zmqstream.ZMQStream object>, fd=<zmq.sugar.socket.Socket object>, events=1)
    435             # dispatch events:
    436             if events & IOLoop.ERROR:
    437                 gen_log.error("got POLLERR event on ZMQStream, which doesn't make sense")
    438                 return
    439             if events & IOLoop.READ:
--> 440                 self._handle_recv()
        self._handle_recv = <bound method ZMQStream._handle_recv of <zmq.eventloop.zmqstream.ZMQStream object>>
    441                 if not self.socket:
    442                     return
    443             if events & IOLoop.WRITE:
    444                 self._handle_send()

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py in _handle_recv(self=<zmq.eventloop.zmqstream.ZMQStream object>)
    467                 gen_log.error("RECV Error: %s"%zmq.strerror(e.errno))
    468         else:
    469             if self._recv_callback:
    470                 callback = self._recv_callback
    471                 # self._recv_callback = None
--> 472                 self._run_callback(callback, msg)
        self._run_callback = <bound method ZMQStream._run_callback of <zmq.eventloop.zmqstream.ZMQStream object>>
        callback = <function null_wrapper>
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    473                 
    474         # self.update_state()
    475         
    476 

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py in _run_callback(self=<zmq.eventloop.zmqstream.ZMQStream object>, callback=<function null_wrapper>, *args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    409         close our socket."""
    410         try:
    411             # Use a NullContext to ensure that all StackContexts are run
    412             # inside our blanket exception handler rather than outside.
    413             with stack_context.NullContext():
--> 414                 callback(*args, **kwargs)
        callback = <function null_wrapper>
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    415         except:
    416             gen_log.error("Uncaught exception, closing connection.",
    417                           exc_info=True)
    418             # Close the socket on an uncaught exception from a user callback

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py in null_wrapper(*args=([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],), **kwargs={})
    270         # Fast path when there are no active contexts.
    271         def null_wrapper(*args, **kwargs):
    272             try:
    273                 current_state = _state.contexts
    274                 _state.contexts = cap_contexts[0]
--> 275                 return fn(*args, **kwargs)
        args = ([<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>],)
        kwargs = {}
    276             finally:
    277                 _state.contexts = current_state
    278         null_wrapper._wrapped = True
    279         return null_wrapper

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py in dispatcher(msg=[<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>])
    271         if self.control_stream:
    272             self.control_stream.on_recv(self.dispatch_control, copy=False)
    273 
    274         def make_dispatcher(stream):
    275             def dispatcher(msg):
--> 276                 return self.dispatch_shell(stream, msg)
        msg = [<zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>, <zmq.sugar.frame.Frame object>]
    277             return dispatcher
    278 
    279         for s in self.shell_streams:
    280             s.on_recv(make_dispatcher(s), copy=False)

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py in dispatch_shell(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, msg={'buffers': [], 'content': {u'allow_stdin': True, u'code': u"from pymc3.step_methods import ATMCMC as atmcm...                        trace=test_folder)\n    ", u'silent': False, u'stop_on_error': True, u'store_history': True, u'user_expressions': {}}, 'header': {'date': '2016-07-08T21:44:16.183295', u'msg_id': u'E4A8E476F2C24AA78398A80114AF222F', u'msg_type': u'execute_request', u'session': u'64843115F877465A9D9E12EE6649D3C9', u'username': u'username', u'version': u'5.0'}, 'metadata': {}, 'msg_id': u'E4A8E476F2C24AA78398A80114AF222F', 'msg_type': u'execute_request', 'parent_header': {}})
    223             self.log.error("UNKNOWN MESSAGE TYPE: %r", msg_type)
    224         else:
    225             self.log.debug("%s: %s", msg_type, msg)
    226             self.pre_handler_hook()
    227             try:
--> 228                 handler(stream, idents, msg)
        handler = <bound method IPythonKernel.execute_request of <ipykernel.ipkernel.IPythonKernel object>>
        stream = <zmq.eventloop.zmqstream.ZMQStream object>
        idents = ['64843115F877465A9D9E12EE6649D3C9']
        msg = {'buffers': [], 'content': {u'allow_stdin': True, u'code': u"from pymc3.step_methods import ATMCMC as atmcm...                        trace=test_folder)\n    ", u'silent': False, u'stop_on_error': True, u'store_history': True, u'user_expressions': {}}, 'header': {'date': '2016-07-08T21:44:16.183295', u'msg_id': u'E4A8E476F2C24AA78398A80114AF222F', u'msg_type': u'execute_request', u'session': u'64843115F877465A9D9E12EE6649D3C9', u'username': u'username', u'version': u'5.0'}, 'metadata': {}, 'msg_id': u'E4A8E476F2C24AA78398A80114AF222F', 'msg_type': u'execute_request', 'parent_header': {}}
    229             except Exception:
    230                 self.log.error("Exception in message handler:", exc_info=True)
    231             finally:
    232                 self.post_handler_hook()

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py in execute_request(self=<ipykernel.ipkernel.IPythonKernel object>, stream=<zmq.eventloop.zmqstream.ZMQStream object>, ident=['64843115F877465A9D9E12EE6649D3C9'], parent={'buffers': [], 'content': {u'allow_stdin': True, u'code': u"from pymc3.step_methods import ATMCMC as atmcm...                        trace=test_folder)\n    ", u'silent': False, u'stop_on_error': True, u'store_history': True, u'user_expressions': {}}, 'header': {'date': '2016-07-08T21:44:16.183295', u'msg_id': u'E4A8E476F2C24AA78398A80114AF222F', u'msg_type': u'execute_request', u'session': u'64843115F877465A9D9E12EE6649D3C9', u'username': u'username', u'version': u'5.0'}, 'metadata': {}, 'msg_id': u'E4A8E476F2C24AA78398A80114AF222F', 'msg_type': u'execute_request', 'parent_header': {}})
    386         if not silent:
    387             self.execution_count += 1
    388             self._publish_execute_input(code, parent, self.execution_count)
    389 
    390         reply_content = self.do_execute(code, silent, store_history,
--> 391                                         user_expressions, allow_stdin)
        user_expressions = {}
        allow_stdin = True
    392 
    393         # Flush output before sending the reply.
    394         sys.stdout.flush()
    395         sys.stderr.flush()

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/ipykernel/ipkernel.py in do_execute(self=<ipykernel.ipkernel.IPythonKernel object>, code=u"from pymc3.step_methods import ATMCMC as atmcm...                        trace=test_folder)\n    ", silent=False, store_history=True, user_expressions={}, allow_stdin=True)
    194 
    195         reply_content = {}
    196         # FIXME: the shell calls the exception handler itself.
    197         shell._reply_content = None
    198         try:
--> 199             shell.run_cell(code, store_history=store_history, silent=silent)
        shell.run_cell = <bound method ZMQInteractiveShell.run_cell of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = u"from pymc3.step_methods import ATMCMC as atmcm...                        trace=test_folder)\n    "
        store_history = True
        silent = False
    200         except:
    201             status = u'error'
    202             # FIXME: this code right now isn't being used yet by default,
    203             # because the run_cell() call above directly fires off exception

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py in run_cell(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, raw_cell=u"from pymc3.step_methods import ATMCMC as atmcm...                        trace=test_folder)\n    ", store_history=True, silent=False, shell_futures=True)
   2718                 self.displayhook.exec_result = result
   2719 
   2720                 # Execute the user code
   2721                 interactivity = "none" if silent else self.ast_node_interactivity
   2722                 self.run_ast_nodes(code_ast.body, cell_name,
-> 2723                    interactivity=interactivity, compiler=compiler, result=result)
        interactivity = 'last_expr'
        compiler = <IPython.core.compilerop.CachingCompiler instance>
   2724 
   2725                 # Reset this so later displayed values do not modify the
   2726                 # ExecutionResult
   2727                 self.displayhook.exec_result = None

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py in run_ast_nodes(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, nodelist=[<_ast.ImportFrom object>, <_ast.Assign object>, <_ast.With object>, <_ast.With object>, <_ast.Assign object>], cell_name='<ipython-input-10-26007b72a78e>', interactivity='none', compiler=<IPython.core.compilerop.CachingCompiler instance>, result=<IPython.core.interactiveshell.ExecutionResult object>)
   2820 
   2821         try:
   2822             for i, node in enumerate(to_run_exec):
   2823                 mod = ast.Module([node])
   2824                 code = compiler(mod, cell_name, "exec")
-> 2825                 if self.run_code(code, result):
        self.run_code = <bound method ZMQInteractiveShell.run_code of <ipykernel.zmqshell.ZMQInteractiveShell object>>
        code = <code object <module> at 0x2aac95b53c30, file "<ipython-input-10-26007b72a78e>", line 24>
        result = <IPython.core.interactiveshell.ExecutionResult object>
   2826                     return True
   2827 
   2828             for i, node in enumerate(to_run_interactive):
   2829                 mod = ast.Interactive([node])

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py in run_code(self=<ipykernel.zmqshell.ZMQInteractiveShell object>, code_obj=<code object <module> at 0x2aac95b53c30, file "<ipython-input-10-26007b72a78e>", line 24>, result=<IPython.core.interactiveshell.ExecutionResult object>)
   2880         outflag = 1  # happens in more places, so it's easier as default
   2881         try:
   2882             try:
   2883                 self.hooks.pre_run_code_hook()
   2884                 #rprint('Running code', repr(code_obj)) # dbg
-> 2885                 exec(code_obj, self.user_global_ns, self.user_ns)
        code_obj = <code object <module> at 0x2aac95b53c30, file "<ipython-input-10-26007b72a78e>", line 24>
        self.user_global_ns = {'ALLOW_THREADS': 1, 'Annotation': <class 'matplotlib.text.Annotation'>, 'Arrow': <class 'matplotlib.patches.Arrow'>, 'Artist': <class 'matplotlib.artist.Artist'>, 'AutoLocator': <class 'matplotlib.ticker.AutoLocator'>, 'Axes': <class 'matplotlib.axes._axes.Axes'>, 'BUFSIZE': 8192, 'Button': <class 'matplotlib.widgets.Button'>, 'CLIP': 0, 'Circle': <class 'matplotlib.patches.Circle'>, ...}
        self.user_ns = {'ALLOW_THREADS': 1, 'Annotation': <class 'matplotlib.text.Annotation'>, 'Arrow': <class 'matplotlib.patches.Arrow'>, 'Artist': <class 'matplotlib.artist.Artist'>, 'AutoLocator': <class 'matplotlib.ticker.AutoLocator'>, 'Axes': <class 'matplotlib.axes._axes.Axes'>, 'BUFSIZE': 8192, 'Button': <class 'matplotlib.widgets.Button'>, 'CLIP': 0, 'Circle': <class 'matplotlib.patches.Circle'>, ...}
   2886             finally:
   2887                 # Reset our crash handler in place
   2888                 sys.excepthook = old_excepthook
   2889         except SystemExit as e:

...........................................................................
/home/batmanghelich/Projects/CgGi/src/notebooks/<ipython-input-10-26007b72a78e> in <module>()
     24 trace = atmcmc.ATMIP_sample(n_steps=200, 
     25                             step=step, 
     26                             njobs=16, 
     27                             progressbar=True, 
     28                             model=mixedEffect_model, 
---> 29                             trace=test_folder)
     30     
     31 
     32 
     33 

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/pymc3/step_methods/ATMCMC.py in ATMIP_sample(***failed resolving arguments***)
    443                             'draws': n_steps,
    444                             'step': step,
    445                             'stage_path': stage_path,
    446                             'progressbar': progressbar,
    447                             'model': model}
--> 448                     mtrace = _iter_parallel_chains(parallel, **sample_args)
        mtrace = undefined
        parallel = Parallel(n_jobs=16)
        sample_args = {'draws': 200, 'model': <pymc3.model.Model object>, 'progressbar': False, 'stage_path': 'ATMIP_TEST/stage_1', 'step': <pymc3.step_methods.ATMCMC.ATMCMC object>}
    449 
    450                     step.population, step.array_population, step.likelihoods = \
    451                                             step.select_end_points(mtrace)
    452                     step.beta, step.old_beta, step.weights = step.calc_beta()

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/pymc3/step_methods/ATMCMC.py in _iter_parallel_chains(parallel=Parallel(n_jobs=16), **kwargs={'draws': 200, 'model': <pymc3.model.Model object>, 'progressbar': False, 'step': <pymc3.step_methods.ATMCMC.ATMCMC object>})
    548     traces = parallel(delayed(
    549                     pm.sampling._sample)(
    550                         chain=chain,
    551                         trace=trace_list[chain],
    552                         start=step.population[step.res_indx[chain]],
--> 553                         **kwargs) for chain in chains)
        kwargs = {'draws': 200, 'model': <pymc3.model.Model object>, 'progressbar': False, 'step': <pymc3.step_methods.ATMCMC.ATMCMC object>}
        chain = 499
        chains = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...]
    554     return pm.sampling.merge_traces(traces)
    555 
    556 
    557 def tune(acc_rate):

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/joblib/parallel.py in __call__(self=Parallel(n_jobs=16), iterable=<generator object <genexpr>>)
    805             if pre_dispatch == "all" or n_jobs == 1:
    806                 # The iterable was consumed all at once by the above for loop.
    807                 # No need to wait for async callbacks to trigger to
    808                 # consumption.
    809                 self._iterating = False
--> 810             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=16)>
    811             # Make sure that we get a last message telling us we are done
    812             elapsed_time = time.time() - self._start_time
    813             self._print('Done %3i out of %3i | elapsed: %s finished',
    814                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError                                         Fri Jul  8 21:50:24 2016
PID: 19019          Python 2.7.12: /home/batmanghelich/anaconda2/bin/python
...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/joblib/parallel.py in __call__(self=<joblib.parallel.BatchedCalls object>)
     67     def __init__(self, iterator_slice):
     68         self.items = list(iterator_slice)
     69         self._size = len(self.items)
     70 
     71     def __call__(self):
---> 72         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function _sample>
        args = ()
        kwargs = {'chain': 0, 'draws': 200, 'model': <pymc3.model.Model object>, 'progressbar': False, 'start': {'eps': array(48.335692387600005), 'h2': array(0.6229504190540001), 'w': array([  1.52518419e+02,  -1.11782607e-01,  -7.6...e+02,
         5.21590715e+01,  -3.00595201e+01]), 'z': array([-6.61825809, -0.19668247,  3.89993292, ..., -2.76709995,
       -6.11325849,  8.52374983])}, 'step': <pymc3.step_methods.ATMCMC.ATMCMC object>, 'trace': <pymc3.backends.text.Text object>}
        self.items = [(<function _sample>, (), {'chain': 0, 'draws': 200, 'model': <pymc3.model.Model object>, 'progressbar': False, 'start': {'eps': array(48.335692387600005), 'h2': array(0.6229504190540001), 'w': array([  1.52518419e+02,  -1.11782607e-01,  -7.6...e+02,
         5.21590715e+01,  -3.00595201e+01]), 'z': array([-6.61825809, -0.19668247,  3.89993292, ..., -2.76709995,
       -6.11325849,  8.52374983])}, 'step': <pymc3.step_methods.ATMCMC.ATMCMC object>, 'trace': <pymc3.backends.text.Text object>})]
     73 
     74     def __len__(self):
     75         return self._size
     76 

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/pymc3/sampling.py in _sample(draws=200, step=<pymc3.step_methods.ATMCMC.ATMCMC object>, start={'eps': array(48.335692387600005), 'h2': array(0.6229504190540001), 'w': array([  1.52518419e+02,  -1.11782607e-01,  -7.6...e+02,
         5.21590715e+01,  -3.00595201e+01]), 'z': array([-6.61825809, -0.19668247,  3.89993292, ..., -2.76709995,
       -6.11325849,  8.52374983])}, trace=<pymc3.backends.text.Text object>, chain=0, tune=None, progressbar=False, model=<pymc3.model.Model object>, random_seed=None)
    154             progressbar=True, model=None, random_seed=None):
    155     sampling = _iter_sample(draws, step, start, trace, chain,
    156                             tune, model, random_seed)
    157     progress = progress_bar(draws)
    158     try:
--> 159         for i, strace in enumerate(sampling):
        i = undefined
        strace = undefined
        sampling = <generator object _iter_sample>
    160             if progressbar:
    161                 progress.update(i)
    162     except KeyboardInterrupt:
    163         strace.close()

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/pymc3/sampling.py in _iter_sample(draws=200, step=<pymc3.step_methods.ATMCMC.ATMCMC object>, start={'eps': array(48.335692387600005), 'h2': array(0.6229504190540001), 'w': array([  1.52518419e+02,  -1.11782607e-01,  -7.6...e+02,
         5.21590715e+01,  -3.00595201e+01]), 'z': array([-6.61825809, -0.19668247,  3.89993292, ..., -2.76709995,
       -6.11325849,  8.52374983])}, trace=<pymc3.backends.text.Text object>, chain=0, tune=None, model=<pymc3.model.Model object>, random_seed=None)
    236 
    237     strace.setup(draws, chain)
    238     for i in range(draws):
    239         if i == tune:
    240             step = stop_tuning(step)
--> 241         point = step.step(point)
        point = {'eps': array(48.335692387600005), 'h2': array(0.6229504190540001), 'w': array([  1.52518419e+02,  -1.11782607e-01,  -7.6...e+02,
         5.21590715e+01,  -3.00595201e+01]), 'z': array([-6.61825809, -0.19668247,  3.89993292, ..., -2.76709995,
       -6.11325849,  8.52374983])}
        step.step = <bound method ATMCMC.step of <pymc3.step_methods.ATMCMC.ATMCMC object>>
    242         strace.record(point)
    243         yield strace
    244     else:
    245         strace.close()

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/pymc3/step_methods/arraystep.py in step(self=<pymc3.step_methods.ATMCMC.ATMCMC object>, point={'eps': array(48.335692387600005), 'h2': array(0.6229504190540001), 'w': array([  1.52518419e+02,  -1.11782607e-01,  -7.6...e+02,
         5.21590715e+01,  -3.00595201e+01]), 'z': array([-6.61825809, -0.19668247,  3.89993292, ..., -2.76709995,
       -6.11325849,  8.52374983])})
    122         for var, share in self.shared.items():
    123             share.container.storage[0] = point[var]
    124 
    125         bij = DictToArrayBijection(self.ordering, point)
    126 
--> 127         apoint = self.astep(bij.map(point))
        apoint = undefined
        self.astep = <bound method ATMCMC.astep of <pymc3.step_methods.ATMCMC.ATMCMC object>>
        bij.map = <bound method DictToArrayBijection.map of <pymc3.blocking.DictToArrayBijection object>>
        point = {'eps': array(48.335692387600005), 'h2': array(0.6229504190540001), 'w': array([  1.52518419e+02,  -1.11782607e-01,  -7.6...e+02,
         5.21590715e+01,  -3.00595201e+01]), 'z': array([-6.61825809, -0.19668247,  3.89993292, ..., -2.76709995,
       -6.11325849,  8.52374983])}
    128         return bij.rmap(apoint)
    129 
    130 def metrop_select(mr, q, q0):
    131     # Perform rejection/acceptance step for Metropolis class samplers

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/pymc3/step_methods/ATMCMC.py in astep(self=<pymc3.step_methods.ATMCMC.ATMCMC object>, q0=array([ -6.61825809,  -0.19668247,   3.89993292,...-30.05952014,
        48.33569239,   0.62295042]))
    117         if self.stage == 0:
    118             self.likelihoods.append(self.logp_forw(q0))
    119             q_new = q0
    120         else:
    121             if not self.stage_sample:
--> 122                 self.proposal_samples_array = self.proposal_dist(self.n_steps)
        self.proposal_samples_array = memmap([[-0.71184519,  2.1856736 ,  1.30567908, ...  0.21084136,
        -0.11862057, -0.22837902]])
        self.proposal_dist = <pymc3.step_methods.metropolis.MultivariateNormalProposal object>
        self.n_steps = 200
    123 
    124             if not self.steps_until_tune and self.tune:
    125                 # Tune scaling parameter
    126                 self.scaling = tune(self.accepted /

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/pymc3/step_methods/metropolis.py in __call__(self=<pymc3.step_methods.metropolis.MultivariateNormalProposal object>, num_draws=200)
     44         return poisson(lam=self.s, size=np.size(self.s)) - self.s
     45 
     46 
     47 class MultivariateNormalProposal(Proposal):
     48     def __call__(self, num_draws=None):
---> 49         return np.random.multivariate_normal(mean=np.zeros(self.s.shape[0]), cov=self.s, size=num_draws)
        self.s.shape = (4285, 4285)
        self.s = memmap([[ inf,  inf, -inf, ...,  inf,  inf, -inf...      [-inf, -inf,  inf, ..., -inf, -inf,  inf]])
        num_draws = 200
     50 
     51 
     52 class Metropolis(ArrayStepShared):
     53     """

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/numpy/random/mtrand.so in mtrand.RandomState.multivariate_normal (numpy/random/mtrand/mtrand.c:32452)()
   4711 
   4712 
   4713 
   4714 
   4715 
-> 4716 
   4717 
   4718 
   4719 
   4720 

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/scipy/linalg/decomp_svd.py in svd(a=array([[ inf,  inf, -inf, ...,  inf,  inf, -inf]...      [-inf, -inf,  inf, ..., -inf, -inf,  inf]]), full_matrices=True, compute_uv=True, overwrite_a=False, check_finite=True)
     83     >>> s2 = linalg.svd(a, compute_uv=False)
     84     >>> np.allclose(s, s2)
     85     True
     86 
     87     """
---> 88     a1 = _asarray_validated(a, check_finite=check_finite)
        a1 = undefined
        a = array([[ inf,  inf, -inf, ...,  inf,  inf, -inf]...      [-inf, -inf,  inf, ..., -inf, -inf,  inf]])
        check_finite = True
     89     if len(a1.shape) != 2:
     90         raise ValueError('expected matrix')
     91     m,n = a1.shape
     92     overwrite_a = overwrite_a or (_datacopied(a1, a))

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/scipy/_lib/_util.py in _asarray_validated(a=array([[ inf,  inf, -inf, ...,  inf,  inf, -inf]...      [-inf, -inf,  inf, ..., -inf, -inf,  inf]]), check_finite=True, sparse_ok=False, objects_ok=False, mask_ok=False, as_inexact=False)
    182             raise ValueError(msg)
    183     if not mask_ok:
    184         if np.ma.isMaskedArray(a):
    185             raise ValueError('masked arrays are not supported')
    186     toarray = np.asarray_chkfinite if check_finite else np.asarray
--> 187     a = toarray(a)
        a = array([[ inf,  inf, -inf, ...,  inf,  inf, -inf]...      [-inf, -inf,  inf, ..., -inf, -inf,  inf]])
        toarray = <function asarray_chkfinite>
    188     if not objects_ok:
    189         if a.dtype is np.dtype('O'):
    190             raise ValueError('object arrays are not supported')
    191     if as_inexact:

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/numpy/lib/function_base.py in asarray_chkfinite(a=array([[ inf,  inf, -inf, ...,  inf,  inf, -inf]...      [-inf, -inf,  inf, ..., -inf, -inf,  inf]]), dtype=None, order=None)
   1028 
   1029     """
   1030     a = asarray(a, dtype=dtype, order=order)
   1031     if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
   1032         raise ValueError(
-> 1033             "array must not contain infs or NaNs")
   1034     return a
   1035 
   1036 
   1037 def piecewise(x, condlist, funclist, *args, **kw):

ValueError: array must not contain infs or NaNs
___________________________________________________________________________

In [11]:

atmcmc.

kayhan-batmanghelich on 9 Jul 2016

I reduced the number of cores and it still produces the same error. This seems like a numerical issue. It can also be an initialization issue because out of three trails, ones go through and ran for few hours (didn't finish because I got disconnected from our server). So far NUTS (after lots of 20,000 iteration) produce good results.

kayhan-batmanghelich on 9 Jul 2016

The initial start points for your traces are being created by calling the .random() method for each variable.
So I guess sometimes your likelihoods are always NaN and it wont be able to evaluate the distributions for this stage reasonably. Can you go to your output folder and open one of the output traces? There you see the values for each parameter and the likelihood. If your likelihoods are all inf/NaN it wont be able to do any valuable operation and thats why you get the error.
That has nothing to do with the cores. You can try increasing n_chains in order to have at least a few valuable starting points or you decrease your solution space to a more reasonable one. Which model again are you running?
For solving the disconnection issue I suggest you use for example "screen" or similar tools to be able to run your calculations without needing to be connected...

hvasbath on 10 Jul 2016

@hvasbath yes you are right; it has nothing to do with cores. Sorry for misunderstanding. I meant I am going to replicate the experiment....
Regarding screen, yes, that is exactly what I am doing but some of these experiments takes more than a day and the my reservation on the node ends before the experiment finishes....
I will re-run the experiment and post the output as soon as I get a reservation.

kayhan-batmanghelich on 10 Jul 2016

OK, this time I got "lucky" and it didn't take 10 hours to fail, it fail within 30. This is the code:

from pymc3.step_methods import ATMCMC as atmcmc

test_folder = ('/scratch/users/batmanghelich/COPDGene/tmp/ATMIP_TEST')


with pm.Model() as mixedEffect_model:
    ### hyperpriors
    h2     = pm.Uniform('h2', transform=None)
    sigma2 = pm.HalfCauchy('eps', 5, transform=None)
    #beta_0 = pm.Uniform('beta_0', lower=-1000, upper=1000)   # a replacement for improper prior
    w = pm.Normal('w', mu = 0, sd = 100, shape=M)
    z = pm.Normal('z', mu = 0, sd= (h2*sigma2)**0.5 , shape=N)
    g = T.dot(L,z)
    y = pm.Normal('y', mu = g + T.dot(X,w), 
                  sd= ((1-h2)*sigma2)**0.5 , observed=Pheno )
    like = pm.Deterministic('like', h2.logpt + sigma2.logpt + w.logpt + z.logpt + y.logpt)


with mixedEffect_model:
    step=atmcmc.ATMCMC(n_chains=500, 
                       tune_interval=20, 
                       likelihood_name=mixedEffect_model.deterministics[0].name)

trace = atmcmc.ATMIP_sample(n_steps=200, 
                            step=step, 
                            njobs=12, 
                            progressbar=True, 
                            model=mixedEffect_model, 
                            trace=test_folder)

This is the error:

---> 49         return np.random.multivariate_normal(mean=np.zeros(self.s.shape[0]), cov=self.s, size=num_draws)
        self.s.shape = (4285, 4285)
        self.s = memmap([[ inf,  inf, -inf, ...,  inf,  inf,  inf...      [ inf,  inf, -inf, ...,  inf,  inf,  inf]])
        num_draws = 200
     50
     51
     52 class Metropolis(ArrayStepShared):
     53     """

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/numpy/random/mtrand.so in mtrand.RandomState.multivariate_normal (numpy/random/mtrand/mtrand.c:32452)()
   4711
   4712
   4713
   4714
   4715
-> 4716
   4717
   4718
   4719
   4720

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/scipy/linalg/decomp_svd.py in svd(a=array([[ inf,  inf, -inf, ...,  inf,  inf,  inf]...      [ inf,  inf, -inf, ...,  inf,  inf,  inf]]), full_matrices=True, compute_uv=True, overwrite_a=False, check_finite=True)
     83     >>> s2 = linalg.svd(a, compute_uv=False)
     84     >>> np.allclose(s, s2)
     85     True
     86
     87     """
---> 88     a1 = _asarray_validated(a, check_finite=check_finite)
        a1 = undefined
        a = array([[ inf,  inf, -inf, ...,  inf,  inf,  inf]...      [ inf,  inf, -inf, ...,  inf,  inf,  inf]])
        check_finite = True
     89     if len(a1.shape) != 2:
     90         raise ValueError('expected matrix')
     91     m,n = a1.shape
     92     overwrite_a = overwrite_a or (_datacopied(a1, a))

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/scipy/_lib/_util.py in _asarray_validated(a=array([[ inf,  inf, -inf, ...,  inf,  inf,  inf]...      [ inf,  inf, -inf, ...,  inf,  inf,  inf]]), check_finite=True, sparse_ok=False, objects_ok=False, mask_ok=False, as_inexact=False)
    182             raise ValueError(msg)
    183     if not mask_ok:
    184         if np.ma.isMaskedArray(a):
    185             raise ValueError('masked arrays are not supported')
    186     toarray = np.asarray_chkfinite if check_finite else np.asarray
--> 187     a = toarray(a)
        a = array([[ inf,  inf, -inf, ...,  inf,  inf,  inf]...      [ inf,  inf, -inf, ...,  inf,  inf,  inf]])
        toarray = <function asarray_chkfinite>
    188     if not objects_ok:
    189         if a.dtype is np.dtype('O'):
    190             raise ValueError('object arrays are not supported')
    191     if as_inexact:

...........................................................................
/home/batmanghelich/anaconda2/lib/python2.7/site-packages/numpy/lib/function_base.py in asarray_chkfinite(a=array([[ inf,  inf, -inf, ...,  inf,  inf,  inf]...      [ inf,  inf, -inf, ...,  inf,  inf,  inf]]), dtype=None, order=None)
   1028
   1029     """
   1030     a = asarray(a, dtype=dtype, order=order)
   1031     if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
   1032         raise ValueError(
-> 1033             "array must not contain infs or NaNs")
   1034     return a
   1035
   1036
   1037 def piecewise(x, condlist, funclist, *args, **kw):

ValueError: array must not contain infs or NaNs

stage_0 is finished and I don't see and Nan/Inf there. stage_1 is empty.

kayhan-batmanghelich on 10 Jul 2016

But exactly that is the problem. The point where it crashes is when it reads the initial trace population and calculates an proposal covariance from this initial population. It is enough to have one NaN or inf in one trace there and it will be all over the whole matrix. What theano version do you have? There was a major fix there between 8.2 and 9.0 which fixed the alloc function to return NaNs, wich resulted in this problem. I always recommend the developers version for theano. Lets hope thats it ;) .

hvasbath on 10 Jul 2016

@hvasbath OK, but at least in the log I couldn't find any inf/nan. Perhaps there is a nan/inf right before writing to the stage files but there is no way to check (other than debugging line by line).
I have to say it is hard to replicate this error because i ran exactly the same code now I am in stage_1 chain 227 and it has not crashed it. So I presume it depends on the initialization.
Regarding Theano: this is my version:

In [2]: theano.__version__
Out[2]: '0.9.0dev1.dev-a668c6c5b6d055b233aa5bc50b22800d996ffce1'

I can request up to 5 days on a node but it is hard to predict when it crashes or how long it takes. Based on your previous reply, it may take up to 30 stages. If one stage takes about 8 hours (using 12 cores), 30 stages takes about 360hrs=15 days. Is there anyway to start from where I left off in case I got kicked off by our admin :)
Is there anyway to handle NaN/inf as an exception and discard that chain?

kayhan-batmanghelich on 11 Jul 2016

The operation where it crashes comes after sampling the stage is finished. So when it crashes, there has to be going sth wrong in the transitional stage. Could you please send me such a trace file next time it crashes?

Such a way to start at a given stage is under construction. But simply I had no time to work on it further. But I guess next week I could find the time to work on that. It should not be too difficult.

To implement such an exception obviously would be good to have- would need to be done as well.

You might want to consider to downscale your problem first, until you are sure that the model is stable and not crashing. As was visible from your earlier post, your matrix is in the order of 4000 by 4000. Which is also why it takes so long to run one forward model. Do you have a specifically compiled ALTAR/BLAS version for your cluster? If not that would be important to do. Then try using only one core for sampling and enable the theano internal parallelisation on the matrix operations- how to do that is in the theano docs. This parallelisation there can be much more effective and could speed up your model significantly.

hvasbath on 11 Jul 2016

Any update on this? @kayhan-batmanghelich

twiecki on 27 Jul 2016

Hi @twiecki,

No Update on that. So far this is my conclusion:

Metropolis and ADVI didn't produce correct results. NUTS produces correct results after 20,000 iteration but that took about a day. ATMCMC was also slow for this scale. It took three days to reach to stage 3 and I guess I needed 10-30 more stages which renders ATMCMC not practical for this problem.

I know the model is correct; as the I mentioned NUTS results seems right. The goal was to achieve a faster inference since I would like to make the model more complicated and slow inference makes that almost impossible.

Anyway, pymc3 is great and I am sure I will use for different problem in my research but for this specific one, I probably need to write a custom inference algorithm.

At the end these are some suggestions. Of course I understand people are busy so feel free to ignore it :)

-- it would be great if ATMCMC can resume in case job crashes, etc. It also seems a bit unstable, more informative message would be helpful.
-- ADVI documentation does not specify which version of ADVI this one is (STAN has two versions).

Thanks everyone for inputs,

kayhan-batmanghelich on 28 Jul 2016

@kayhan-batmanghelich I just recently found out why it is so slow. That is because of how the backends work. See this issue: https://github.com/pymc-devs/pymc3/issues/1264
I just finished a new backend especially for ATMCMC but it is so far not compatible anymore with the main pymc3 structure, which is definitely not what we want.

I am also working on continueing the sampling from a later stage...
But if your modelsetup allows to calculate gradients it is way more efficient to use NUTS- in my humble oppinion.
ATMCMC is really taking the machine gun and shooting into the solution space- which is computationally really expensive. But if you dont have gradients -or they are expensive to calculate it is one way to go rather than restarting the Metropolis 100 times until you found a good starting point ;) .

In which way it is instable? Please let me know, otherwise I cant improve it ;) . The sampler is rather young and it is basically my fault if something doesnt work well. ;)

hvasbath on 28 Jul 2016

@hvasbath Thanks for your reply.

Please let me know whenever you merge the changes to the main repository and I will try it again.

Regarding instability, it produces the nan/inf values at different stages of the sampling (please see above). It is hard to reproduce it. For example, last time I couldn't reproduce it because my job got killed by scheduler after three days and it was still running.

It is not your fault my friend, thank you for sharing your code :)

kayhan-batmanghelich on 29 Jul 2016

With the example data and the new auto-init https://github.com/pymc-devs/pymc3/pull/1523 this converges immediately.

twiecki on 18 Nov 2016

@twiecki Would you please post the code as well. We have already talked about different ways of doing it here. Also, would please let me know which example data you used?
Thanks,

kayhan-batmanghelich on 18 Nov 2016

I used this:

import numpy as np, pymc3 as pm, theano.tensor as T, matplotlib.pyplot as plt

M    = 6  # number of columns in X - fixed effect
N    = 10 # number of columns in L - random effect
nobs = 10

# generate design matrix using patsy
from patsy import dmatrices
import pandas as pd
predictors = []
for s1 in range(N):
    for c1 in range(2):
        for c2 in range(3):
            for i in range(nobs):
                predictors.append(np.asarray([c1+1,c2+1,s1+1]))
tbltest             = pd.DataFrame(predictors, columns=['Condi1', 'Condi2', 'subj'])
tbltest['Condi1']   = tbltest['Condi1'].astype('category')
tbltest['Condi2']   = tbltest['Condi2'].astype('category')
tbltest['subj']     = tbltest['subj'].astype('category')
tbltest['tempresp'] = np.random.normal(size=(nobs*M*N,1))

Y, X    = dmatrices("tempresp ~ Condi1*Condi2", data=tbltest, return_type='matrix')
Terms   = X.design_info.column_names
_, L    = dmatrices('tempresp ~ -1+subj', data=tbltest, return_type='matrix')
X       = np.asarray(X) # fixed effect
L       = np.asarray(L) # mixed effect
Y       = np.asarray(Y) 
# generate data
w0 = [5,1,2,3,1,1]
z0 = np.random.normal(size=(N,))
Pheno   = np.dot(X,w0) + np.dot(L,z0) + Y.flatten()
#%%
with pm.Model() as mixedEffect_model:
    ### hyperpriors
    h2     = pm.Uniform('h2')
    sigma2 = pm.HalfCauchy('eps', 5)
    #beta_0 = pm.Uniform('beta_0', lower=-1000, upper=1000)   # a replacement for improper prior
    w = pm.Normal('w', mu = 0, sd = 100, shape=M)
    z = pm.Normal('z', mu = 0, sd= (h2*sigma2)**0.5 , shape=N)
    g = T.dot(L,z)
    y = pm.Normal('y', mu = g + T.dot(X,w), 
                  sd= ((1-h2)*sigma2)**0.5 , observed=Pheno )
    trace = pm.sample(5000)

twiecki on 18 Nov 2016

@twiecki Thanks, so the Metropolis converges quickly. I will try it on my data and will report back. I presume I need to update the pymc since there has been significant changes. Thanks.

kayhan-batmanghelich on 18 Nov 2016

@kayhan-batmanghelich Great, that would be helpful. Make sure to not submit a step-method object and update to master.

twiecki on 18 Nov 2016

Pymc3: ADVI, NUTS and Metropolis produce significantly different results

Most helpful comment

All 46 comments

Related issues