Pymc3: Port BayesFactor example to pymc3

Created on 2 Sep 2015 · 19Comments · Source: pymc-devs/pymc3

The BayesFactor example in pymc3 is written in pymc2: https://github.com/pymc-devs/pymc3/wiki/BayesFactor. I could really use the pymc3 version since I can't figure out how to make an observed deterministic in pymc3. Just guessing but couldn't the switch replace the function? In pseudo code: pymc3.switch(true_model, poisson_likelihood, geometric_likelihood)

Also, the example imports a lot of stuff which is unused. It could simply start with

from pymc import Uniform, Lambda, observed, Bernoulli, geometric_like, poisson_like

# Data
Y = [0,1,2,3,8]

Source

lindeloev

Most helpful comment

Just for anyone else returning here: to calculate the bayes factor do the following in the end of the model:

trace = sample(10000, step=Metropolis())
p_pois = trace[5000:]['true_model'].mean()  # mean value (i.e. the rate of poisson samples to all samples)
BF = ((1-p_pois)/p_pois) * (pi[1]/pi[0])  # BayesFactor in favor of poisson model, taking prior probability into account

lindeloev on 3 Sep 2015

👍3

All 19 comments

Fixed the wiki example -- thanks.

Something like the following appears to work with PyMC3:

from pymc3 import (Model, Uniform, Bernoulli, Poisson, 
                Geometric, switch, DensityDist, sample,
                Metropolis, Slice, summary)

with Model() as model:

    # Data
    Y = [0,1,2,3,8]

    # Prior model probabilities
    pi = (0.1, 0.9)

    # Index to true model
    true_model = Bernoulli('true_model', pi[1])

    # Poisson mean
    mu = Uniform('mu', 0, 1000)

    # Geometric probability
    p = 1/(1+mu)

    Ylike = DensityDist('Ylike',
            lambda value: switch(true_model, 
                Poisson.dist(mu).logp(value), 
                Geometric.dist(p).logp(value+1)
            ),
            observed=Y)

    trace = sample(10000, step=Metropolis())

    summary(trace[5000:])

fonnesbeck on 2 Sep 2015

👍2

Indeed it does! I just ran it for a couple of different datasets, comparing the pymc and pymc3 and they give similar results. Thank you!

lindeloev on 2 Sep 2015

Just for anyone else returning here: to calculate the bayes factor do the following in the end of the model:

trace = sample(10000, step=Metropolis())
p_pois = trace[5000:]['true_model'].mean()  # mean value (i.e. the rate of poisson samples to all samples)
BF = ((1-p_pois)/p_pois) * (pi[1]/pi[0])  # BayesFactor in favor of poisson model, taking prior probability into account

lindeloev on 3 Sep 2015

👍3

Thanks for updating this. Was really struggling with this problem too.

markdregan on 3 Sep 2015

@fonnesbeck & @lindeloev - Do you know how to implement the above example when the likelihood functions within switch are of different dimensions?

eg. the below excerpt aims to compare a hierarchal poisson model vs a non-hierarchal poisson model.

Ylike = DensityDist('Ylike',
            lambda value: switch(true_model, 
                Poisson.dist(mu_a[group_index]).logp(value), 
                Poisson.dist(ma_b).logp(value)
            ),
            observed=Y)

The doc string indicates that switch only works with inputs of the same dimension:
"All the inputs must have the same number of dimensions. When the
Op is performed, for each dimension, each input's size for that
dimension must be the same."

markdregan on 4 Sep 2015

The inputs to switch do have the same dimension in your example. If you are just using indicators to switch parameters on or off, that does not change the dimensionality. For example:

with Model() as model:

    z = Bernoulli('z', 0.5, shape=k)

    beta = Normal('beta', 0, 1, shape=k)
    mu = tt.dot(X, switch(z, beta, np.zeros(k)))

    tau = Gamma('tau', 1, 0.1)
    y = Normal('y', mu, tau, observed=Y)

does not change dimension as far as the model graph is concerned; z, beta, mu and y are always the same size, but you are essentially turning off covariates.

fonnesbeck on 4 Sep 2015

Thank you Chris!

markdregan on 6 Sep 2015

@lindeloev I think

BF = ((1-p_pois)/p_pois) * (pi[1]/pi[0])

Should be the BayesFactor in favor of geometric model, not the Poisson model.

aloctavodia on 17 Dec 2015

I am not sure if this is the right place to post this comment, while is not about the particular example I think fits the general discussion of how to compute Bayes factor in PyMC3.

What about computing the BF using something like:

lm0 = np.exp(np.array([model0.logp(pt) for pt in trace0[5000:]])).mean()
lm1 = np.exp(np.array([model1.logp(pt) for pt in trace1[5000:]])).mean()

BF = lm0/lm1

i.e. you fit the model as individual models (not using a model index in a hierarchical structure). And then computes the BF from the model and trace objects. What could be the problem with this way of computing the BF?
I think one advantage of using this approach is that for some problems I have found that sampling could be difficult in a hierarchical model with a model index, specially when the data favors one model much more than the other, in that cases if you want to estimate the parameters of one (or both models) you end-up building separate model anyway. Also, I think this way of computing the BF, looks closer to its definition.

aloctavodia on 17 Dec 2015

@aloctavodia There's a blog post by Radford Neal that I think looks at this (using the geometric mean though): https://radfordneal.wordpress.com/2008/08/17/the-harmonic-mean-of-the-likelihood-worst-monte-carlo-method-ever/ Let me know what you think.

twiecki on 17 Dec 2015

@twiecki, thanks for the link. I suspected it was too good to be true! but I was getting confident since all the examples I tried with both approach gave essentially the same results.
So at least for now the solutions in PyMC3 seems to be to use a switch-model approach or learn about specific methods to compute the evidence and implement them. I will need to learn more about computing the evidence efficiently for a problem that I am starting to work on, so thanks again for the pointer.

aloctavodia on 17 Dec 2015

@aloctavodia Radford is certainly a purist :).

I recently heard a talk where they used Thermodynamic Integration to compute the evidence. Don't mean to nerd-snipe you as it seems overkill but I thought I'd mention it: http://users.wpi.edu/~balnan/MarLik-BF-0.pdf

twiecki on 17 Dec 2015

@aloctavodia They just added Thermodynamic Integration to emcee https://github.com/dfm/emcee/pull/161/files wonder how hard it'd be to port.

twiecki on 22 Dec 2015

@twiecki Nice! I skim through the article about Thermodynamic Integration (TI) and I will read it more carefully ASAP.
If I get it right, I thing the most demanding task is to implement the replica exchange (AKA parallel tempering) method and not TI itself. Implementing replica exchange is something I definitely would like to do. In fact is something that I have planed to start a couple of months ago, but unfortunately I did not find the time, yet. Hopefully I will do it next year, when I come back from my vacations, unless some else wants to do it. I guess reading the code from emcee will be really helpful. What do you have in mind?

aloctavodia on 22 Dec 2015

Hi guys, I have a question about Bayes factor. I am comparing a gamma dist and an exponential dist (both with uniform distribution priors). Between these two I get wildly varying results - sometimes in favour of one, sometimes in favour of the other, sometimes zero. Does this mean that for the data I am modelling both the gamma and exponential are equally valid? I tried varying the prior model Bernoulli probability from 0.5 (I saw this as a suggestion to vary the coverage) but this did not result in more consistent results. If I try comparing either the gamma or the exp to say, a uniform dist, then I do get more consistent results. Is there anything I can try to get more definitive results between the gamma and the exponential? Apologies if I left out any important info - this is the first time I am trying Bayesian inference so I am a newbie. Thanks for any help you can provide! Cheers.

Ksub64 on 23 Dec 2015

Hi @Ksub64, I think the best place to ask this type of question is stackoverflow (please anyone correct me if I am wrong). Posting your complete model could also help.

In general (and based on my limited experience on the subject), I think the following things help when computing Bayes factor using a hierarchical model:

use ElemwiseCategoricalStep to sample the model index
change the prior probability to favor the less probable model (ideally you should get half samples from each model)
use pseudo-priors

Bayes factor is a very hot topic among Bayesian, some people defend them some people recommend avoiding them as much as possible. For example Gelman recommend using WAIC instead of Bayes Factors or just do parameter estimation and compare model in a more qualitative way. Kruschke also warn the Bayesian practitioner on the "perils of Bayes factors" and favors the parameter estimation approach.

aloctavodia on 24 Dec 2015

Hi @aloctavodia thanks for your quick response. I will think your suggestions over and use stackoverflow next time. FYI I had already tried your second suggestion (changing the prior probability) as I thought this would have some effect, but it did not have any effect in my case. Will have a look at the other two. Cheers.

Ksub64 on 28 Dec 2015

Closing

fonnesbeck on 5 Sep 2016