I quite often find myself working with models where the likelihood of the data given the model parameters is "custom" in some sense (e.g. machine learning models where the data likelihood is given by a loss function that doesn't reduce to a standard distribution provided by pymc3). While I think I can get pymc3 to give me correct solutions in these situations I feel that there is some syntactic shoehorning going on and so I wanted to check that I'm approaching these situations in the right way.
Here's some simple example code implementing a multiple regression model, where we want to get estimates for the beta ("B") regression parameters.
import numpy as np
import pymc3 as pm
if __name__ == "__main__":
# Generate some data.
X = np.random.randn(100, 4)
X[:, 0] = 1. # Intercept column.
Y = np.dot(X, [0.5, 0.1, 0.25, 1.]) + 0.1 * np.random.randn(X.shape[0])
# Pymc3 model.
model = pm.Model()
with model:
# Define beta priors.
B = pm.Normal("B", mu=0.0, sd=1.0, shape=X.shape[1])
# Model variables.
Ymu = pm.dot(X, B.T)
Ysd = pm.Uniform("Ysd", 0., 10.)
# Data likelihood.
required_argument = 123.45
logp = -pm.log(Ysd * pm.sqrt(2.0 * np.pi)) - (pm.sqr(Y - Ymu) / (2.0 * Ysd * Ysd))
def logp_func(required_argument):
return logp.sum()
logpvar = pm.DensityDist("logpvar", logp_func, observed={"required_argument": required_argument})
# Sample.
start = pm.find_MAP(model=model)
step = pm.NUTS(scaling=start)
trace = pm.sample(100, start=start, step=step)
print(pm.summary(trace))
Sidenote: Instead of all the "Data likelihood" stuff involving pm.DensityDist I know I could just use pymc3's GLM functionality in this case, or a simple line line
Data = pm.Normal("Data", mu=Ymu, sd=Ysd, observed=Y)
but let's assume for the sake of the example that the actual likelihood I'm writing down is one that can't be expressed simply in terms of the distributions pymc3 makes available and that I need to use some custom theano expression (maybe logp is actually some expression involving a t-distribution or a log-logistic or cross-entropy or something).
When I execute the code above it (a) runs and (b) correctly recovers the parameters used to generate the random data. However, I find it weird / awkward firstly that in order to get the code to run the logp_func is required to have a dummy argument that the computation of logp doesn't actually depend on (I get an error if I try to make the logp_func have no arguments). And secondly that I have to specify that the required_argument is observed in the pm.DensityDist expression.
This suggests to me that there might be a better way of handling cases like this in pymc3 that I am missing, or alternatively that there might be a way of improving the way pymc3 handles cases like this. Any thoughts?
P.S. I'd also like to thank everyone involved in developing pymc3. The idea of using theano as the substrate for pymc3's random variables was a masterstroke, and it's allowing me to apply MCMC methods to much larger-scaled problems than previously. And the added bonus of getting gradient derivations for "free" and hence things like NUTS is fantastic. No more reading Neal Redford papers in order to implement my own Hamiltonian sampler! Thank you all!
OK, after reading around some of the other issues on GitHub I discovered pymc3 implements a Potential class much as pymc2 did. I have not seen this feature documented anywhere, so just in case anyone else is running into the same issue as me here's the appropriate way of handling custom data likelihoods in pymc3 without having to resort to DensityDiff awkwardness:
# Data likelihood.
logp = -pm.log(Ysd * pm.sqrt(2.0 * np.pi)) - (pm.sqr(Y - Ymu) / (2.0 * Ysd * Ysd))
logpvar = pm.Potential("logpvar", logp.sum())
Yes, perhaps we should add this to the tutorial.
In your specific case, would it make sense to make your own customer distribution? https://github.com/pymc-devs/pymc3/blob/master/pymc3/distributions/continuous.py#L132 Or does that not fit?
It might be possible to model situations like this as custom distributions, although I don't think the result would/could be simpler than just using pymc3.Potential.
I suppose you'd be trying to define a joint distribution over all the variables in your model where the distribution combines the likelihood of both the priors and the data, but I'm not sure if it'd be possible to have this represented by a single "value" of the distribution (pymc3 seems to require this) when you could possibly have many discrete prior model variables that you were trying to draw together. The Potential solution seems like the better approach to me overall.
A Potential isn't quite the same as a custom stochastic because it is not a node in the DAG that gets its value updated, but it can be used to modify the joint likelihood. The observed argument should always be data, so if there is no associated data, its a clear sign that you should be using a Potential.
@yarlett Sorry if it is a lame question. But I am new to Pymc3 and I am trying to figure out why did you give required argument in
observed={"required_argument": required_argument}
Because I what I saw from a few examples is that we give our observed data or input for the observed parameter. In this case shouldn't it be Y?
Most helpful comment
A Potential isn't quite the same as a custom stochastic because it is not a node in the DAG that gets its value updated, but it can be used to modify the joint likelihood. The
observedargument should always be data, so if there is no associated data, its a clear sign that you should be using a Potential.