At the moment there doesn't seem to be a _generalized_ way to compose categorical variables, i.e. have a categorical node inherit from one or more parent categorical nodes.
I've re-implemented a Bayesian Network example from the PMML documentation tutorial (gist notebook here, and original source here).
Currently the only way I can figure out to allow inheritance is to do a nasty theano.tensor.switch() chain call in a kind of binary boolean tree. The old PyMC2 way is to write an @pymc3.deterministic decorator in front of a long pre-defined python if, elif, else block (commented out on the gist, after I attempted unsuccessfully to use @theano.compile.ops.as_op() to bring that way into PyMC3.
There needs to be a way to _generally_ take some set of N parent nodes, with categorical distributions, and have their categorical child "inherit" the probabilities when supplied with an (N+1)-dimensional array of category probabilities.
Preferably this should be done using theano, which lets one use the nicer samplers when available.
Some of this (for example, C_1 | D_1) can be achieved with mixture models as follows:
p1 = pm.Dirichlet('p1', np.array([0.3, 0.7]))
c1 = pm.NormalMixture('c1', p1, np.array([10, 14]), sd=np.sqrt(np.array([2., 2.])))
I will give a little thought to how to express some of the more complicated relationships more conveniently.
@AustinRochford That's an excellent idea. So would a bayesian categorical mixture model be applicable here?
The hope is that I can eventually automate the reading and writing of PMML-style xml Bayes Net schema, to and from PyMC3.
Meanwhile, I did update the original notebook to use slices on a shared numpy probability array instead of those awful theano switches. That seems to be quite a bit faster than the original (I'm getting around 10x faster sampling now).
The only remaining issue is the need for the switch statements in splitting up the C3 cotinuous node into 3 categories by boolean arguments (which theano doesn't really support). numpy.select() doesn't function with theano comparison statements as condition input.
It seems like a variant of that mixture could do the trick.
Yes, the dependence of D3 on C3 is a bit awkward. One way to approach this is by not modeling C3 directly, but placing priors on its mean and variance. You could use those to model the probability that C3 falls in each of those buckets directly. For instance if mu is its mean and sigma is its standard deviation, something like
from pymc3.distributions.dist_math import std_cdf
p_c3 = pm.Deterministic('p_c3', tt.stack([std_cdf((9 - mu) / std),
std_cdf((11 - mu) / std) - std_cdf((9 - mu) / std),
1 - std_cdf((11 - mu) / std)]))
might work well.
You could of course also parametrize the cutoff points, although you would need to jump through some hoops to make the model identified.
This could also be vectorized with theano.tensor.extra_ops.diff.
Hang on, isn't extra_ops.diff for first order fwd finite difference? As in, a derivative approximation? How would that vectorize the cutoffs?
So I tested the mixture-model:
#--------------Define Prior Information---------------#
d1_prob = np.array([0.3, 0.7]) # 2 choices
d2_prob = np.array([0.6, 0.3, 0.1]) # 3 choices
d3_prob = np.array([[[0.1, 0.9], # (2x3)x2 choices
[0.3, 0.7],
[0.4, 0.6]],
[[0.6, 0.4],
[0.8, 0.2],
[0.9, 0.1]]])
d4_prob = np.array([[[0.4, 0.6], # (2x3)x2 choices
[0.6, 0.4],
[0.3, 0.7]],
[[0.4, 0.6],
[0.3, 0.7],
[0.1, 0.9]]])
c1_mu, c1_sd = np.array([[10, 14], # 2 choices inherit
[2 , 2 ]])
c2_mu, c2_sd = np.array([[6, 8, 14], # 3 choices inherit
[2, 1, 1 ]])
#------------------------------------------------------#
with pm.Model() as model:
D1 = pm.Categorical('D1', p=d1_prob)
D2 = pm.Categorical('D2', p=d2_prob)
# C1 = pm.Normal('C1',mu = 10 + 4*D1, tau = (1./2)**2)
p1 = pm.Dirichlet('p1', d1_prob)
C1 = pm.NormalMixture('C1', p1, mu=c1_mu, sd=c1_sd)
# C2 = pm.Normal('C2',mu=6+2*(D2**2), tau=1)
p2 = pm.Dirichlet('p2', d2_prob)
C2 = pm.NormalMixture('C2', p2, mu=c2_mu, sd=c2_sd)
D3_prob = theano.shared(d3_prob)
D3_0 = D3_prob[D1, D2]
D3 = pm.Categorical('D3', p=D3_0)
C3 = pm.Normal('C3', mu=(0.15 * (C2 ** 2) * (1 - D3) + 1.5 * C2 * D3), tau=(1. / (2 - D3)) ** 2)
C4 = pm.Normal('C4', mu=0.1 * C2 ** 2 + 0.6 * C2 + 1, tau=0.25, observed=[7])
# C3_0 = np.select([T.lt(C3,9), T.gt(C3,9) & T.lt(C3,11), T.gt(C3,11)], [0,1,2]) # doesnt work
C3_0 = T.switch(T.lt(C3, 9), 0,
T.switch(T.gt(C3, 9) & T.lt(C3, 11), 1, 2))
D4_prob = theano.shared(d4_prob)
D4_0 = D4_prob[D3, C3_0]
D4 = pm.Categorical('D4', p=D4_0, observed=[0])
# Create MCMC object
with model:
trace = pm.sample(10000)
And it causes in inexplicable python.exe has stopped working error. The commented out version works fine for 10k samples (w/NUTS for the C nodes)
I might try on a linux VM and report back.
Don't worry about the diff stuff too much, all i mean was that you could apply diff to [0, std_cdf((9 - mu) / std), std_cdf((11 - mu) / std), 1] to vectorize that probability calculation (and actually you could use std_cdf applied to a vector). In any case, that's not important right now.
@AustinRochford So I found that on Linux I'm running into #1476 almost to the T, which is strange since the original poster solved it by updated glibc ... my libc6 is at the newest version (2.23 in ldd). And that shouldn't have any bearing on the windows problem right?
I've verified that it's NormalMixture causing the issue.
@tbsexton And you are on master of pymc3 and theano?
@twiecki I've done two, one from the stable conda for both packages (as seen in notebook gist), and one I've been using to test the network plotting from@stevenjkern (stevenjkern:enh/model-to-graph pull request #1683)
I think that second one is inheriting directly from master, with some added labels in the model definitions.
@twiecki So this error causing python.exe to crashed seems to have been fixed after I updated PyMC3 to master. Potentially #1905 was the solution? Pip seems to have updated Theano to 0.9 as well, so maybe that had something to do with it. Either way.
@tbsexton so this can be closed?
@twiecki I suppose it can be. As @AustinRochford mentioned the NormalMixture() method is a much more elegant way to compose these distributions, although as he also says, the more complex relationships are not possible to model this way (for example, one cannot, as far as I can see, use a Dirichlet prior for D3 --> C3, since the probabilities for D3 are now a theano shared variable and that's not allowed for instantiating the Dirichlet RV.
But, I think this is becoming more of a theano and general Bayesian problem than a PyMC3 one.
Most helpful comment
@twiecki I suppose it can be. As @AustinRochford mentioned the
NormalMixture()method is a much more elegant way to compose these distributions, although as he also says, the more complex relationships are not possible to model this way (for example, one cannot, as far as I can see, use a Dirichlet prior for D3 --> C3, since the probabilities for D3 are now a theano shared variable and that's not allowed for instantiating the Dirichlet RV.But, I think this is becoming more of a theano and general Bayesian problem than a PyMC3 one.