Pymc3: Test model logp before starting any MCMC chains

Created on 18 Sep 2020 · 14Comments · Source: pymc-devs/pymc3

Description of your problem

When a model contains nan observations, or observations that are outside of the support of the prior, the models logp evaluates to -inf.

Both find_MAP and sample don't check against this and fail in different ways that don't make the underlying problem obvious:

pm.find_MAP does two iterations, showing -inf for the logp, but it doesn't raise errors
pm.sample typically fails with Bad initial energy
pm.sample on Windows with multiprocessing crashes the child processes with uninformative error escalation (output similar as shown in #3403). If run with Jupyter notebooks, it even crashes the entire notebook server

Related issues:
enhancement #4107 would use the same information, but NOT raise

Proposed solution

At the beginning of find_MAP/ sample and BEFORE creating child processes we should evaluate the model and raise an informative error.

Possibly even telling the user which variable was nan/inf?

beginner friendly enhancements hackathon

Source

michaelosthege

👍3

All 14 comments

Hi, I'm happy to take this on as long as someone can give me pointers on how to implement the proposed solution?

StephenHogg on 5 Nov 2020

👍1

Both sampling and find_MAP use a start kwarg/variable to initialize their iterations.

For example, it is assigned to model.test_point here: https://github.com/pymc-devs/pymc3/blob/9eb69fc6f06eb1f507a29eff070cfc47b07275a4/pymc3/tuning/starting.py#L88

The Model has properties for the log-posterior and gradient. I always forget what they are called - model.logp and model.dlogp or so.
You can stick the start value (defaulting to model.test_point) into that and check that it evaluates to a number (not inf or nan).

The General API Quickstart notebook has some cells using model.logp.

Note that the start kwarg to pm.sample can be of a few different types.

michaelosthege on 5 Nov 2020

So am I right in thinking that this is as simple as updating the start of find_MAP to be something like this?

    model = modelcontext(model)
    if start is None:
        start = model.test_point
    else:
        update_start_vals(start, model.test_point, model)

    if not set(start.keys()).issubset(model.named_vars.keys()):
        extra_keys = ", ".join(set(start.keys()) - set(model.named_vars.keys()))
        valid_keys = ", ".join(model.named_vars.keys())
        raise KeyError("Some start parameters do not appear in the model!\n"
                       "Valid keys are: {}, but {} was supplied".format(valid_keys, extra_keys))

    initial_eval = model.logp(start)
    if (~np.isfinite(initial_eval)).any():
        raise ValueError("Initial evaluation of model at starting point {} failed:\n"
                         "Variables appear to have NaN or Inf values".format(start))

Additionally, how do I get it to spit out which variables have the problem?

StephenHogg on 6 Nov 2020

Something like that, yes. I wouldn't be surprised if some of the tests break.

Your if does not check against NaN yet. Technically it may be possible to tell which variables are responsible, but I don't know if there's an easy solution.

michaelosthege on 6 Nov 2020

Something like that, yes. I wouldn't be surprised if some of the tests break.

Your if does not check against NaN yet. Technically it may be possible to tell which variables are responsible, but I don't know if there's an easy solution.

Thanks! Actually, NaN checking is accomplished with np.isfinite - here's the relevant doc: https://numpy.org/doc/stable/reference/generated/numpy.isfinite.html

As far as the working out which variable is responsible thing, I guess I will drop that for now then.

StephenHogg on 7 Nov 2020

You should look at how check_test_point does this.

lucianopaz on 7 Nov 2020

👍1

Thanks, unless there's anything outright wrong with me doing so I'll probably just inspect the output of that method.

I'm having a lot of trouble getting the tests, to work, unfortunately. As it stands, this is the code in sampling.sample():

    if start is None:
        start = [model.test_point] * (chains if chains else 1)
    else:
        if isinstance(start, dict):
            start = [start] * (chains if chains else 1)

    for chain_start_vals in start:
        update_start_vals(chain_start_vals, model.test_point, model)

        if not set(chain_start_vals.keys()).issubset(model.named_vars.keys()):
            extra_keys = ", ".join(set(chain_start_vals.keys()) - set(model.named_vars.keys()))
            valid_keys = ", ".join(model.named_vars.keys())
            raise KeyError("Some start parameters do not appear in the model!\n"
                           "Valid keys are: {}, but {} was supplied".format(valid_keys, extra_keys))

        initial_eval = model.check_test_point(test_point=chain_start_vals)

        if not np.all(np.isfinite(initial_eval)):
            raise ValueError("Initial evaluation of model at starting point {} failed:\n"
                             "Variables appear to have NaN or Inf values: {}".format(
                                 chain_start_vals, str(initial_eval.index[np.isfinite(initial_eval)].to_list())))

This generates errors that look like the following:

______________________________________________________________________________________________________________________________ TestSample.test_sample_init ______________________________________________________________________________________________________________________________

self = <pymc3.tests.test_sampling.TestSample object at 0x1155c12d0>

    def test_sample_init(self):
        with self.model:
            for init in ("advi", "advi_map", "map"):
                pm.sample(
                    init=init,
                    tune=0,
                    n_init=1000,
                    draws=50,
>                   random_seed=self.random_seed,
                )

pymc3/tests/test_sampling.py:95: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pymc3/sampling.py:539: in sample
    trace = _mp_sample(**sample_args, **parallel_args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

draws = 50, tune = 0, step = <pymc3.step_methods.hmc.nuts.NUTS object at 0x139b3a6d0>, chains = 4, cores = 4, chain = 0, random_seed = [163206115, 507505786, 368981262, 154707469], start = [{'x': array([0.1, 0.1])}], progressbar = True, trace = None
model = <pymc3.model.Model object at 0x138690950>, callback = None, discard_tuned_samples = True, mp_ctx = None, pickle_backend = 'pickle', kwargs = {}, ps = <module 'pymc3.parallel_sampling' from '/Users/shogg/git/pymc3/pymc3/parallel_sampling.py'>
traces = [<pymc3.backends.ndarray.NDArray object at 0x139c48650>], idx = 1, strace = <pymc3.backends.ndarray.NDArray object at 0x139c48610>

    def _mp_sample(
        draws: int,
        tune: int,
        step,
        chains: int,
        cores: int,
        chain: int,
        random_seed: list,
        start: list,
        progressbar=True,
        trace=None,
        model=None,
        callback=None,
        discard_tuned_samples=True,
        mp_ctx=None,
        pickle_backend="pickle",
        **kwargs,
    ):
        """Main iteration for multiprocess sampling.

        Parameters
        ----------
        draws : int
            The number of samples to draw
        tune : int, optional
            Number of iterations to tune, if applicable (defaults to None)
        step : function
            Step function
        chains : int
            The number of chains to sample.
        cores : int
            The number of chains to run in parallel.
        chain : int
            Number of the first chain.
        random_seed : list of ints
            Random seeds for each chain.
        start : list
            Starting points for each chain.
        progressbar : bool
            Whether or not to display a progress bar in the command line.
        trace : backend, list, MultiTrace or None
            This should be a backend instance, a list of variables to track, or a MultiTrace object
            with past values. If a MultiTrace object is given, it must contain samples for the chain
            number ``chain``. If None or a list of variables, the NDArray backend is used.
        model : Model (optional if in ``with`` context)
        callback : Callable
            A function which gets called for every sample from the trace of a chain. The function is
            called with the trace and the current draw and will contain all samples for a single trace.
            the ``draw.chain`` argument can be used to determine which of the active chains the sample
            is drawn from.
            Sampling can be interrupted by throwing a ``KeyboardInterrupt`` in the callback.

        Returns
        -------
        trace : pymc3.backends.base.MultiTrace
            A ``MultiTrace`` object that contains the samples for all chains.
        """
        import pymc3.parallel_sampling as ps

        # We did draws += tune in pm.sample
        draws -= tune

        traces = []
        for idx in range(chain, chain + chains):
            if trace is not None:
                strace = _choose_backend(copy(trace), idx, model=model)
            else:
                strace = _choose_backend(None, idx, model=model)
            # for user supply start value, fill-in missing value if the supplied
            # dict does not contain all parameters
>           update_start_vals(start[idx - chain], model.test_point, model)
E           IndexError: list index out of range

pymc3/sampling.py:1426: IndexError

Any ideas on how to proceed would be gratefully received.

StephenHogg on 9 Nov 2020

The indexing of start[idx - chain] seems to be the problem. I assume you already changed something?

It may help to insert a print(start) in the line before. The print will be captured by the test logger.

michaelosthege on 9 Nov 2020

Ok, have got that problem fixed. I now have two tests that have errors like this:

__________________________________________________________________ TestSamplePriorPredictive.test_shared __________________________________________________________________

self = <pymc3.tests.test_sampling.TestSamplePriorPredictive object at 0x13e31b4d0>

    def test_shared(self):
        n1 = 10
        obs = shared(np.random.rand(n1) < 0.5)
        draws = 50

        with pm.Model() as m:
            p = pm.Beta("p", 1.0, 1.0)
>           y = pm.Bernoulli("y", p, observed=obs)

pymc3/tests/test_sampling.py:860: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pymc3/distributions/distribution.py:98: in __new__
    return model.Var(name, dist, data, total_size, dims=dims)
pymc3/model.py:1166: in Var
    model=self,
pymc3/model.py:1800: in __init__
    self.tag.test_value = theano.compile.view_op(data).tag.test_value
../../.pyenv/versions/3.7.5/envs/pymc3/lib/python3.7/site-packages/theano/gof/utils.py:279: in __setattr__
    obj = self.attr_filter(obj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = TensorType(bool, vector), data = array([1, 1, 1, 0, 0, 0, 1, 1, 1, 0]), strict = False, allow_downcast = None

    def filter(self, data, strict=False, allow_downcast=None):
        """
        Convert `data` to something which can be associated to a
        `TensorVariable`.

        This function is not meant to be called in user code. It is for
        `Linker` instances to use when running a compiled graph.

        """
        # Explicit error message when one accidentally uses a Variable as
        # input (typical mistake, especially with shared variables).
        if isinstance(data, Variable):
            raise TypeError(
                "Expected an array-like object, but found a Variable: "
                "maybe you are trying to call a function on a (possibly "
                "shared) variable instead of a numeric array?"
            )

        if (type(data) is np.ndarray) and (data.dtype == self.numpy_dtype):
            if data.dtype.num != self.numpy_dtype.num:
                data = theano._asarray(data, dtype=self.dtype)
            # -- now fall through to ndim check
        elif (type(data) is np.memmap) and (data.dtype == self.numpy_dtype):
            # numpy.memmap is a "safe" subclass of ndarray,
            # so we can use it wherever we expect a base ndarray.
            # however, casting it would defeat the purpose of not
            # loading the whole data into memory
            pass
        elif strict:
            # If any of the two conditions above was not met,
            # we raise a meaningful TypeError.
            if not (type(data) is np.ndarray):
                raise TypeError(
                    "%s expected a ndarray object." % self, data, type(data)
                )
            if data.dtype != self.numpy_dtype:
                raise TypeError(
                    ("%s expected a ndarray object with " "dtype = %s (got %s).")
                    % (self, self.numpy_dtype, data.dtype)
                )
            raise AssertionError("This point should never be reached.")
        else:
            if allow_downcast:
                # Convert to self.dtype, regardless of the type of data
                data = theano._asarray(data, dtype=self.dtype)
                # TODO: consider to pad shape with ones to make it consistent
                # with self.broadcastable... like vector->row type thing
            else:
                if isinstance(data, np.ndarray):
                    # Check if self.dtype can accurately represent data
                    # (do not try to convert the data)
                    up_dtype = scal.upcast(self.dtype, data.dtype)
                    if up_dtype == self.dtype:
                        # Bug in the following line when data is a
                        # scalar array, see
                        # http://projects.scipy.org/numpy/ticket/1611
                        # data = data.astype(self.dtype)
                        data = theano._asarray(data, dtype=self.dtype)
                    if up_dtype != self.dtype:
                        err_msg = (
                            "%s cannot store a value of dtype %s without "
                            "risking loss of precision. If you do not mind "
                            "this loss, you can: "
                            "1) explicitly cast your data to %s, or "
                            '2) set "allow_input_downcast=True" when calling '
                            '"function". Value: "%s"'
                            % (self, data.dtype, self.dtype, repr(data))
                        )
>                       raise TypeError(err_msg)
E                       TypeError: TensorType(bool, vector) cannot store a value of dtype int64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to bool, or 2) set "allow_input_downcast=True" when calling "function". Value: "array([1, 1, 1, 0, 0, 0, 1, 1, 1, 0])"

Any ideas as to what this one is about? It seems like some sort of environment variable problem, but I'm not sure how it's arising in this context

StephenHogg on 9 Nov 2020

I see no connection, but it's hard to tell without the diff. Can you open a Draft PR?

michaelosthege on 9 Nov 2020

Sure - PR open

StephenHogg on 9 Nov 2020

@StephenHogg you can now merge the latest changes from master into your branch (4116) to fix that unrelated (locally) failing test:

git remote add pymcdevs https://github.com/pymc-devs/pymc3
git fetch pymcdevs master
git checkout 4116
git merge master

michaelosthege on 11 Nov 2020

Suggest closing the issue now that #4211 is merged

StephenHogg on 27 Nov 2020

👍1

@StephenHogg Small tip: if in your PR you would have written "Closes #4116" this would have happened automatically.

twiecki on 27 Nov 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

error on import pymc3: AttributeError: module 'xarray.core.formatting_html' has no attribute 'CSS_STYLE'

alessandro-gentilini · 5Comments

Failing tests with generators pass

jonathanhfriedman · 6Comments

Attributes of trace.report are None when using sample_smc—Converting to arviz fails

astoeriko · 3Comments

Bug with pm.Forestplot

springcoil · 3Comments

DLL load error on windows 10

Abraxas2071 · 4Comments