The BayesFactor example in pymc3 is written in pymc2: https://github.com/pymc-devs/pymc3/wiki/BayesFactor. I could really use the pymc3 version since I can't figure out how to make an observed deterministic in pymc3. Just guessing but couldn't the switch replace the function? In pseudo code: pymc3.switch(true_model, poisson_likelihood, geometric_likelihood)
Also, the example imports a lot of stuff which is unused. It could simply start with
from pymc import Uniform, Lambda, observed, Bernoulli, geometric_like, poisson_like
# Data
Y = [0,1,2,3,8]
Fixed the wiki example -- thanks.
Something like the following appears to work with PyMC3:
from pymc3 import (Model, Uniform, Bernoulli, Poisson,
Geometric, switch, DensityDist, sample,
Metropolis, Slice, summary)
with Model() as model:
# Data
Y = [0,1,2,3,8]
# Prior model probabilities
pi = (0.1, 0.9)
# Index to true model
true_model = Bernoulli('true_model', pi[1])
# Poisson mean
mu = Uniform('mu', 0, 1000)
# Geometric probability
p = 1/(1+mu)
Ylike = DensityDist('Ylike',
lambda value: switch(true_model,
Poisson.dist(mu).logp(value),
Geometric.dist(p).logp(value+1)
),
observed=Y)
trace = sample(10000, step=Metropolis())
summary(trace[5000:])
Indeed it does! I just ran it for a couple of different datasets, comparing the pymc and pymc3 and they give similar results. Thank you!
Just for anyone else returning here: to calculate the bayes factor do the following in the end of the model:
trace = sample(10000, step=Metropolis())
p_pois = trace[5000:]['true_model'].mean() # mean value (i.e. the rate of poisson samples to all samples)
BF = ((1-p_pois)/p_pois) * (pi[1]/pi[0]) # BayesFactor in favor of poisson model, taking prior probability into account
Thanks for updating this. Was really struggling with this problem too.
@fonnesbeck & @lindeloev - Do you know how to implement the above example when the likelihood functions within switch are of different dimensions?
eg. the below excerpt aims to compare a hierarchal poisson model vs a non-hierarchal poisson model.
Ylike = DensityDist('Ylike',
lambda value: switch(true_model,
Poisson.dist(mu_a[group_index]).logp(value),
Poisson.dist(ma_b).logp(value)
),
observed=Y)
The doc string indicates that switch only works with inputs of the same dimension:
"All the inputs must have the same number of dimensions. When the
Op is performed, for each dimension, each input's size for that
dimension must be the same."
The inputs to switch do have the same dimension in your example. If you are just using indicators to switch parameters on or off, that does not change the dimensionality. For example:
with Model() as model:
z = Bernoulli('z', 0.5, shape=k)
beta = Normal('beta', 0, 1, shape=k)
mu = tt.dot(X, switch(z, beta, np.zeros(k)))
tau = Gamma('tau', 1, 0.1)
y = Normal('y', mu, tau, observed=Y)
does not change dimension as far as the model graph is concerned; z, beta, mu and y are always the same size, but you are essentially turning off covariates.
Thank you Chris!
@lindeloev I think
BF = ((1-p_pois)/p_pois) * (pi[1]/pi[0])
Should be the BayesFactor in favor of geometric model, not the Poisson model.
I am not sure if this is the right place to post this comment, while is not about the particular example I think fits the general discussion of how to compute Bayes factor in PyMC3.
What about computing the BF using something like:
lm0 = np.exp(np.array([model0.logp(pt) for pt in trace0[5000:]])).mean()
lm1 = np.exp(np.array([model1.logp(pt) for pt in trace1[5000:]])).mean()
BF = lm0/lm1
i.e. you fit the model as individual models (not using a model index in a hierarchical structure). And then computes the BF from the model and trace objects. What could be the problem with this way of computing the BF?
I think one advantage of using this approach is that for some problems I have found that sampling could be difficult in a hierarchical model with a model index, specially when the data favors one model much more than the other, in that cases if you want to estimate the parameters of one (or both models) you end-up building separate model anyway. Also, I think this way of computing the BF, looks closer to its definition.
@aloctavodia There's a blog post by Radford Neal that I think looks at this (using the geometric mean though): https://radfordneal.wordpress.com/2008/08/17/the-harmonic-mean-of-the-likelihood-worst-monte-carlo-method-ever/ Let me know what you think.
@twiecki, thanks for the link. I suspected it was too good to be true! but I was getting confident since all the examples I tried with both approach gave essentially the same results.
So at least for now the solutions in PyMC3 seems to be to use a switch-model approach or learn about specific methods to compute the evidence and implement them. I will need to learn more about computing the evidence efficiently for a problem that I am starting to work on, so thanks again for the pointer.
@aloctavodia Radford is certainly a purist :).
I recently heard a talk where they used Thermodynamic Integration to compute the evidence. Don't mean to nerd-snipe you as it seems overkill but I thought I'd mention it: http://users.wpi.edu/~balnan/MarLik-BF-0.pdf
@aloctavodia They just added Thermodynamic Integration to emcee https://github.com/dfm/emcee/pull/161/files wonder how hard it'd be to port.
@twiecki Nice! I skim through the article about Thermodynamic Integration (TI) and I will read it more carefully ASAP.
If I get it right, I thing the most demanding task is to implement the replica exchange (AKA parallel tempering) method and not TI itself. Implementing replica exchange is something I definitely would like to do. In fact is something that I have planed to start a couple of months ago, but unfortunately I did not find the time, yet. Hopefully I will do it next year, when I come back from my vacations, unless some else wants to do it. I guess reading the code from emcee will be really helpful. What do you have in mind?
Hi guys, I have a question about Bayes factor. I am comparing a gamma dist and an exponential dist (both with uniform distribution priors). Between these two I get wildly varying results - sometimes in favour of one, sometimes in favour of the other, sometimes zero. Does this mean that for the data I am modelling both the gamma and exponential are equally valid? I tried varying the prior model Bernoulli probability from 0.5 (I saw this as a suggestion to vary the coverage) but this did not result in more consistent results. If I try comparing either the gamma or the exp to say, a uniform dist, then I do get more consistent results. Is there anything I can try to get more definitive results between the gamma and the exponential? Apologies if I left out any important info - this is the first time I am trying Bayesian inference so I am a newbie. Thanks for any help you can provide! Cheers.
Hi @Ksub64, I think the best place to ask this type of question is stackoverflow (please anyone correct me if I am wrong). Posting your complete model could also help.
In general (and based on my limited experience on the subject), I think the following things help when computing Bayes factor using a hierarchical model:
Bayes factor is a very hot topic among Bayesian, some people defend them some people recommend avoiding them as much as possible. For example Gelman recommend using WAIC instead of Bayes Factors or just do parameter estimation and compare model in a more qualitative way. Kruschke also warn the Bayesian practitioner on the "perils of Bayes factors" and favors the parameter estimation approach.
Hi @aloctavodia thanks for your quick response. I will think your suggestions over and use stackoverflow next time. FYI I had already tried your second suggestion (changing the prior probability) as I thought this would have some effect, but it did not have any effect in my case. Will have a look at the other two. Cheers.
Closing
Most helpful comment
Just for anyone else returning here: to calculate the bayes factor do the following in the end of the model: