The following code gives an error when using PyMC3 development branch.
Please provide a minimal, self-contained, and reproducible example.
import numpy as np
import arviz as az
import pymc3 as pm
import time
start_time = time.time()
np.random.seed(42)
true_m = 0.5
true_b = -1.3
true_logs = np.log(0.3)
x = np.sort(np.random.uniform(0, 5, 20000))
y = true_b + true_m * x + np.exp(true_logs) * np.random.randn(len(x))
with pm.Model() as model:
m = pm.Uniform("m", lower=-5, upper=5)
b = pm.Uniform("b", lower=-5, upper=5)
logs = pm.Uniform("logs", lower=-5, upper=5)
pm.Normal("obs", mu=m*x+b, sd=pm.math.exp(logs), observed=y)
print("Getting trace...")
trace = pm.sample(draws=1000, tune=1000, chains=2)
print("Getting posterior_predictive...")
posterior_predictive = pm.sample_posterior_predictive(trace)
print("Getting idata...")
idata = az.from_pymc3(
trace=trace, posterior_predictive=posterior_predictive
)
print("--- %s seconds ---" % (time.time() - start_time))
Please provide the full traceback.
Traceback (most recent call last):
File "/home/nitish/env2/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 128, in run
self._start_loop()
File "/home/nitish/env2/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 182, in _start_loop
point, stats = self._compute_point()
File "/home/nitish/env2/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 209, in _compute_point
point, stats = self._step_method.step(self._point)
File "/home/nitish/env2/lib/python3.6/site-packages/pymc3/step_methods/arraystep.py", line 263, in step
apoint, stats = self.astep(array)
File "/home/nitish/env2/lib/python3.6/site-packages/pymc3/step_methods/hmc/base_hmc.py", line 144, in astep
self.potential.raise_ok(self._logp_dlogp_func._ordering.vmap)
File "/home/nitish/env2/lib/python3.6/site-packages/pymc3/step_methods/hmc/quadpotential.py", line 255, in raise_ok
raise ValueError('\n'.join(errmsg))
ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV `logs_interval__`.ravel()[0] is zero.
The derivative of RV `b_interval__`.ravel()[0] is zero.
The derivative of RV `m_interval__`.ravel()[0] is zero.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/nitish/env2/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 128, in run
self._start_loop()
File "/home/nitish/env2/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 182, in _start_loop
point, stats = self._compute_point()
File "/home/nitish/env2/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 209, in _compute_point
point, stats = self._step_method.step(self._point)
File "/home/nitish/env2/lib/python3.6/site-packages/pymc3/step_methods/arraystep.py", line 263, in step
apoint, stats = self.astep(array)
File "/home/nitish/env2/lib/python3.6/site-packages/pymc3/step_methods/hmc/base_hmc.py", line 144, in astep
self.potential.raise_ok(self._logp_dlogp_func._ordering.vmap)
File "/home/nitish/env2/lib/python3.6/site-packages/pymc3/step_methods/hmc/quadpotential.py", line 255, in raise_ok
raise ValueError('\n'.join(errmsg))
ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV `logs_interval__`.ravel()[0] is zero.
The derivative of RV `b_interval__`.ravel()[0] is zero.
The derivative of RV `m_interval__`.ravel()[0] is zero.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "logp_benchmarks.py", line 21, in <module>
trace = pm.sample(draws=1000, tune=1000, chains=2)
File "/home/nitish/env2/lib/python3.6/site-packages/pymc3/sampling.py", line 498, in sample
trace = _mp_sample(**sample_args)
File "/home/nitish/env2/lib/python3.6/site-packages/pymc3/sampling.py", line 1384, in _mp_sample
for draw in sampler:
File "/home/nitish/env2/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 412, in __iter__
draw = ProcessAdapter.recv_draw(self._active)
File "/home/nitish/env2/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 314, in recv_draw
raise error from old_error
RuntimeError: Chain 1 failed.
Please provide any additional information below.
Note that this code originally worked for @OriolAbril on an older version of PyMC3 and still works for me on PyMC3 3.8 but using the development branch causes this error.
Hi Nitish,
I think this is a modeling error more than a bug: the priors are not regularizing enough. Uniform priors are usually not a good practice, especially when there is an exponential somewhere (numbers get really big, really fast and this messes up the sampler).
Knowing that, this model runs with PyMC3 master branch:
with pm.Model() as model:
m = pm.Normal("m", 0., 1.)
b = pm.Normal("b", 0., 1.5)
sigma = pm.Exponential("sigma", 1.)
pm.Normal("obs", mu=m * x + b, sigma=sigma observed=y)
trace = pm.sample(chains=2)
Does this work for you too?
Hi Nitish,
I think this is a modeling error more than a bug: the priors are not regularizing enough. Uniform priors are usually not a good practice, especially when there is an exponential somewhere (numbers get really big, really fast and this messes up the sampler).Knowing that, this model runs with PyMC3 master branch:
with pm.Model() as model: m = pm.Normal("m", 0., 1.) b = pm.Normal("b", 0., 1.5) sigma = pm.Exponential("sigma", 1.) pm.Normal("obs", mu=m * x + b, sigma=sigma observed=y) trace = pm.sample(chains=2)Does this work for you too? If yes, then we can close this issue.
Hi @AlexAndorra, thanks for the reply!
I understand now and yes this works for me but why was the original model still working in v3.8?
Could the occurence of ValueError: Mass matrix contains zeros on the diagonal. have something to do with changes related to new QuadPotentialFullAdapt features? @eigenfoo might know more about it
I can confirm that the original model runs with 3.8, so it looks like something happened on the dev branch, as Michael pointed out.
EDIT: There indeed seems to be a problem: this model from the documentation doesn't run anymore (same error as above), even with more regularizing priors centered on the true values.
After exploring further, I noticed the following simple model (which comes from this doc NB) only runs when testval _and_ init="adapt_diag" are supplied:
N = 10000
μ_actual = np.array([1, -2])
Σ_actual = np.array([[0.5, -0.3], [-0.3, 1.]])
x = np.random.multivariate_normal(μ_actual, Σ_actual, size=N)
with pm.Model() as m:
packed_L = pm.LKJCholeskyCov('packed_L', n=2, eta=2., sd_dist=pm.Exponential.dist(1.0))
L = pm.expand_packed_triangular(2, packed_L)
Σ = pm.Deterministic('Σ', L.dot(L.T))
μ = pm.Normal('μ', 0., 10., shape=2, testval=x.mean(axis=0))
obs = pm.MvNormal('obs', μ, chol=L, observed=x)
trace = pm.sample(init="adapt_diag")
ValueError: Mass matrix contains zeros on the diagonal -- an error that seems to be more frequent on the master branch. Could that be related to recent changes on master?I have seen this many times, as well, to the point where I routinely supply 'adapt_diag' to overwrite the default 'jitter+adapt_diag'. I _assume_ that this happens because PyMC3 can jitter the initial point to a place in the space that's infeasible?
Yeah I think that's why. What's puzzling to me though, is that this same model runs smoothly in the publicly released 3.8, without having to change the initialization and supply testval 🤔
I cannot reproduce your example @AlexAndorra
Also I dont think we are using QuadPotentialFullAdapt by default right now? it should still be adapt_diag (with full adapt as a new feature)
@junpenglao, do you mean that for you the model above samples on the master branch without having to specify testval=x.mean(axis=0) and init="adapt_diag"?
For me, if I don't specify those, I get ValueError: Mass matrix contains zeros on the diagonal on the master branch. But it runs on 3.8.
I wonder -- is there some way to take an existing run that has this problem and capture the random seed, so that we can replicate the behavior we are seeeing?
It's frustrating to me that I see this problem all the time, but it can't be replicated, and so cannot be fixed.
If there is, it would be great to add how to do it here or somewhere else and add the link in https://github.com/arviz-devs/arviz/issues/1083 too
This code fails for me every time I run it.
Chain 2 fails after between 600 and 1400 total samples. I guess the number of completed samples from chains 0, 1 & 3 vary each time I run before the chain 2 failure.
I updated from PyMC3 master yesterday
N = 1000
μ_actual = np.array([1, -2])
Σ_actual = np.array([[0.5, -0.3], [-0.3, 1.]])
x = np.random.multivariate_normal(μ_actual, Σ_actual, size=N)
with pm.Model() as m:
packed_L = pm.LKJCholeskyCov('packed_L', n=2, eta=2., sd_dist=pm.Exponential.dist(1.0))
L = pm.expand_packed_triangular(2, packed_L)
Σ = pm.Deterministic('Σ', L.dot(L.T))
μ = pm.Normal('μ', 0., 10., shape=2)
obs = pm.MvNormal('obs', μ, chol=L, observed=x)
trace = pm.sample(random_seed=3)
ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV `μ`.ravel()[0] is zero.
The derivative of RV `μ`.ravel()[1] is zero.
The derivative of RV `packed_L_cholesky-cov-packed__`.ravel()[0] is zero.
The derivative of RV `packed_L_cholesky-cov-packed__`.ravel()[1] is zero.
The derivative of RV `packed_L_cholesky-cov-packed__`.ravel()[2] is zero.
"""
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV `μ`.ravel()[0] is zero.
The derivative of RV `μ`.ravel()[1] is zero.
The derivative of RV `packed_L_cholesky-cov-packed__`.ravel()[0] is zero.
The derivative of RV `packed_L_cholesky-cov-packed__`.ravel()[1] is zero.
The derivative of RV `packed_L_cholesky-cov-packed__`.ravel()[2] is zero.
The above exception was the direct cause of the following exception:
I'm closing this for now, because, after some discussions, this seems to be a random error (i.e false positive), but let's be attentive and deal with this if problems arise in the coming 3.9 release.