When a model contains nan observations, or observations that are outside of the support of the prior, the models logp evaluates to -inf.
Both find_MAP and sample don't check against this and fail in different ways that don't make the underlying problem obvious:
pm.find_MAP does two iterations, showing -inf for the logp, but it doesn't raise errorspm.sample typically fails with Bad initial energypm.sample on Windows with multiprocessing crashes the child processes with uninformative error escalation (output similar as shown in #3403). If run with Jupyter notebooks, it even crashes the entire notebook serverRelated issues:
enhancement #4107 would use the same information, but NOT raise
At the beginning of find_MAP/ sample and BEFORE creating child processes we should evaluate the model and raise an informative error.
Possibly even telling the user which variable was nan/inf?
Hi, I'm happy to take this on as long as someone can give me pointers on how to implement the proposed solution?
Both sampling and find_MAP use a start kwarg/variable to initialize their iterations.
For example, it is assigned to model.test_point here: https://github.com/pymc-devs/pymc3/blob/9eb69fc6f06eb1f507a29eff070cfc47b07275a4/pymc3/tuning/starting.py#L88
The Model has properties for the log-posterior and gradient. I always forget what they are called - model.logp and model.dlogp or so.
You can stick the start value (defaulting to model.test_point) into that and check that it evaluates to a number (not inf or nan).
The General API Quickstart notebook has some cells using model.logp.
Note that the start kwarg to pm.sample can be of a few different types.
So am I right in thinking that this is as simple as updating the start of find_MAP to be something like this?
model = modelcontext(model)
if start is None:
start = model.test_point
else:
update_start_vals(start, model.test_point, model)
if not set(start.keys()).issubset(model.named_vars.keys()):
extra_keys = ", ".join(set(start.keys()) - set(model.named_vars.keys()))
valid_keys = ", ".join(model.named_vars.keys())
raise KeyError("Some start parameters do not appear in the model!\n"
"Valid keys are: {}, but {} was supplied".format(valid_keys, extra_keys))
initial_eval = model.logp(start)
if (~np.isfinite(initial_eval)).any():
raise ValueError("Initial evaluation of model at starting point {} failed:\n"
"Variables appear to have NaN or Inf values".format(start))
Additionally, how do I get it to spit out which variables have the problem?
Something like that, yes. I wouldn't be surprised if some of the tests break.
Your if does not check against NaN yet. Technically it may be possible to tell which variables are responsible, but I don't know if there's an easy solution.
Something like that, yes. I wouldn't be surprised if some of the tests break.
Your
ifdoes not check againstNaNyet. Technically it may be possible to tell which variables are responsible, but I don't know if there's an easy solution.
Thanks! Actually, NaN checking is accomplished with np.isfinite - here's the relevant doc: https://numpy.org/doc/stable/reference/generated/numpy.isfinite.html
As far as the working out which variable is responsible thing, I guess I will drop that for now then.
You should look at how check_test_point does this.
Thanks, unless there's anything outright wrong with me doing so I'll probably just inspect the output of that method.
I'm having a lot of trouble getting the tests, to work, unfortunately. As it stands, this is the code in sampling.sample():
if start is None:
start = [model.test_point] * (chains if chains else 1)
else:
if isinstance(start, dict):
start = [start] * (chains if chains else 1)
for chain_start_vals in start:
update_start_vals(chain_start_vals, model.test_point, model)
if not set(chain_start_vals.keys()).issubset(model.named_vars.keys()):
extra_keys = ", ".join(set(chain_start_vals.keys()) - set(model.named_vars.keys()))
valid_keys = ", ".join(model.named_vars.keys())
raise KeyError("Some start parameters do not appear in the model!\n"
"Valid keys are: {}, but {} was supplied".format(valid_keys, extra_keys))
initial_eval = model.check_test_point(test_point=chain_start_vals)
if not np.all(np.isfinite(initial_eval)):
raise ValueError("Initial evaluation of model at starting point {} failed:\n"
"Variables appear to have NaN or Inf values: {}".format(
chain_start_vals, str(initial_eval.index[np.isfinite(initial_eval)].to_list())))
This generates errors that look like the following:
______________________________________________________________________________________________________________________________ TestSample.test_sample_init ______________________________________________________________________________________________________________________________
self = <pymc3.tests.test_sampling.TestSample object at 0x1155c12d0>
def test_sample_init(self):
with self.model:
for init in ("advi", "advi_map", "map"):
pm.sample(
init=init,
tune=0,
n_init=1000,
draws=50,
> random_seed=self.random_seed,
)
pymc3/tests/test_sampling.py:95:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pymc3/sampling.py:539: in sample
trace = _mp_sample(**sample_args, **parallel_args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
draws = 50, tune = 0, step = <pymc3.step_methods.hmc.nuts.NUTS object at 0x139b3a6d0>, chains = 4, cores = 4, chain = 0, random_seed = [163206115, 507505786, 368981262, 154707469], start = [{'x': array([0.1, 0.1])}], progressbar = True, trace = None
model = <pymc3.model.Model object at 0x138690950>, callback = None, discard_tuned_samples = True, mp_ctx = None, pickle_backend = 'pickle', kwargs = {}, ps = <module 'pymc3.parallel_sampling' from '/Users/shogg/git/pymc3/pymc3/parallel_sampling.py'>
traces = [<pymc3.backends.ndarray.NDArray object at 0x139c48650>], idx = 1, strace = <pymc3.backends.ndarray.NDArray object at 0x139c48610>
def _mp_sample(
draws: int,
tune: int,
step,
chains: int,
cores: int,
chain: int,
random_seed: list,
start: list,
progressbar=True,
trace=None,
model=None,
callback=None,
discard_tuned_samples=True,
mp_ctx=None,
pickle_backend="pickle",
**kwargs,
):
"""Main iteration for multiprocess sampling.
Parameters
----------
draws : int
The number of samples to draw
tune : int, optional
Number of iterations to tune, if applicable (defaults to None)
step : function
Step function
chains : int
The number of chains to sample.
cores : int
The number of chains to run in parallel.
chain : int
Number of the first chain.
random_seed : list of ints
Random seeds for each chain.
start : list
Starting points for each chain.
progressbar : bool
Whether or not to display a progress bar in the command line.
trace : backend, list, MultiTrace or None
This should be a backend instance, a list of variables to track, or a MultiTrace object
with past values. If a MultiTrace object is given, it must contain samples for the chain
number ``chain``. If None or a list of variables, the NDArray backend is used.
model : Model (optional if in ``with`` context)
callback : Callable
A function which gets called for every sample from the trace of a chain. The function is
called with the trace and the current draw and will contain all samples for a single trace.
the ``draw.chain`` argument can be used to determine which of the active chains the sample
is drawn from.
Sampling can be interrupted by throwing a ``KeyboardInterrupt`` in the callback.
Returns
-------
trace : pymc3.backends.base.MultiTrace
A ``MultiTrace`` object that contains the samples for all chains.
"""
import pymc3.parallel_sampling as ps
# We did draws += tune in pm.sample
draws -= tune
traces = []
for idx in range(chain, chain + chains):
if trace is not None:
strace = _choose_backend(copy(trace), idx, model=model)
else:
strace = _choose_backend(None, idx, model=model)
# for user supply start value, fill-in missing value if the supplied
# dict does not contain all parameters
> update_start_vals(start[idx - chain], model.test_point, model)
E IndexError: list index out of range
pymc3/sampling.py:1426: IndexError
Any ideas on how to proceed would be gratefully received.
The indexing of start[idx - chain] seems to be the problem. I assume you already changed something?
It may help to insert a print(start) in the line before. The print will be captured by the test logger.
Ok, have got that problem fixed. I now have two tests that have errors like this:
__________________________________________________________________ TestSamplePriorPredictive.test_shared __________________________________________________________________
self = <pymc3.tests.test_sampling.TestSamplePriorPredictive object at 0x13e31b4d0>
def test_shared(self):
n1 = 10
obs = shared(np.random.rand(n1) < 0.5)
draws = 50
with pm.Model() as m:
p = pm.Beta("p", 1.0, 1.0)
> y = pm.Bernoulli("y", p, observed=obs)
pymc3/tests/test_sampling.py:860:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pymc3/distributions/distribution.py:98: in __new__
return model.Var(name, dist, data, total_size, dims=dims)
pymc3/model.py:1166: in Var
model=self,
pymc3/model.py:1800: in __init__
self.tag.test_value = theano.compile.view_op(data).tag.test_value
../../.pyenv/versions/3.7.5/envs/pymc3/lib/python3.7/site-packages/theano/gof/utils.py:279: in __setattr__
obj = self.attr_filter(obj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = TensorType(bool, vector), data = array([1, 1, 1, 0, 0, 0, 1, 1, 1, 0]), strict = False, allow_downcast = None
def filter(self, data, strict=False, allow_downcast=None):
"""
Convert `data` to something which can be associated to a
`TensorVariable`.
This function is not meant to be called in user code. It is for
`Linker` instances to use when running a compiled graph.
"""
# Explicit error message when one accidentally uses a Variable as
# input (typical mistake, especially with shared variables).
if isinstance(data, Variable):
raise TypeError(
"Expected an array-like object, but found a Variable: "
"maybe you are trying to call a function on a (possibly "
"shared) variable instead of a numeric array?"
)
if (type(data) is np.ndarray) and (data.dtype == self.numpy_dtype):
if data.dtype.num != self.numpy_dtype.num:
data = theano._asarray(data, dtype=self.dtype)
# -- now fall through to ndim check
elif (type(data) is np.memmap) and (data.dtype == self.numpy_dtype):
# numpy.memmap is a "safe" subclass of ndarray,
# so we can use it wherever we expect a base ndarray.
# however, casting it would defeat the purpose of not
# loading the whole data into memory
pass
elif strict:
# If any of the two conditions above was not met,
# we raise a meaningful TypeError.
if not (type(data) is np.ndarray):
raise TypeError(
"%s expected a ndarray object." % self, data, type(data)
)
if data.dtype != self.numpy_dtype:
raise TypeError(
("%s expected a ndarray object with " "dtype = %s (got %s).")
% (self, self.numpy_dtype, data.dtype)
)
raise AssertionError("This point should never be reached.")
else:
if allow_downcast:
# Convert to self.dtype, regardless of the type of data
data = theano._asarray(data, dtype=self.dtype)
# TODO: consider to pad shape with ones to make it consistent
# with self.broadcastable... like vector->row type thing
else:
if isinstance(data, np.ndarray):
# Check if self.dtype can accurately represent data
# (do not try to convert the data)
up_dtype = scal.upcast(self.dtype, data.dtype)
if up_dtype == self.dtype:
# Bug in the following line when data is a
# scalar array, see
# http://projects.scipy.org/numpy/ticket/1611
# data = data.astype(self.dtype)
data = theano._asarray(data, dtype=self.dtype)
if up_dtype != self.dtype:
err_msg = (
"%s cannot store a value of dtype %s without "
"risking loss of precision. If you do not mind "
"this loss, you can: "
"1) explicitly cast your data to %s, or "
'2) set "allow_input_downcast=True" when calling '
'"function". Value: "%s"'
% (self, data.dtype, self.dtype, repr(data))
)
> raise TypeError(err_msg)
E TypeError: TensorType(bool, vector) cannot store a value of dtype int64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to bool, or 2) set "allow_input_downcast=True" when calling "function". Value: "array([1, 1, 1, 0, 0, 0, 1, 1, 1, 0])"
Any ideas as to what this one is about? It seems like some sort of environment variable problem, but I'm not sure how it's arising in this context
I see no connection, but it's hard to tell without the diff. Can you open a Draft PR?
Sure - PR open
@StephenHogg you can now merge the latest changes from master into your branch (4116) to fix that unrelated (locally) failing test:
git remote add pymcdevs https://github.com/pymc-devs/pymc3
git fetch pymcdevs master
git checkout 4116
git merge master
Suggest closing the issue now that #4211 is merged
@StephenHogg Small tip: if in your PR you would have written "Closes #4116" this would have happened automatically.