Pyro: Observing deterministically transformed output

Created on 12 Nov 2017  路  15Comments  路  Source: pyro-ppl/pyro

Thanks for a great library!

I have the following model (which samples alright):

def add_one_or_two(guess):
    init = 2
    choice = pyro.sample("choice", dist.categorical, ps=guess,vs=[False,True])
    if choice:
        outcome = init + 1
    else:
        outcome = init + 2
    return outcome

Now I would like to get a marginal for "choice", after having observed 4 as the output of add_one_or_two. It is not 100% clear to me from the Conditioning on Models intro how this would look like in this case. Somehow, conditioning seems to be linked to the outputs of sample statements but for me, the output of the model is just a deterministic transformation of the choice sample. How should I go about this?

question

All 15 comments

Pyro only allows conditioning on sample sites, not on arbitrary deterministic functions of sample sites. This is because conditioning is implemented as a transformation from pyro.sample to pyro.observe, and observations can only be at sample sites. Pyro has some support for invertible transformations in TransformedDistribution.

You may be able to workaround this using a Delta distribution

def add_one_or_two(guess):
    ...
    outcome = pyro.sample("outcome", dist.delta, outcome)
    return outcome

(but I haven't tried this!)

That's an interesting limitation. Most non-trivial implicit models, including physical simulators and GANs, are non-invertible function outputs. How might poutines support this natively? (It would also be nice to support on the algorithm side; doesn't it refute the "universal" claim?)

Great, thanks! Will give this a go. Don't know how I missed the delta---I tried a categorical with one element in the support, but that didn't fly.

@riedelcastro I'd suggest using a Bernoulli with a very small positive probability of your constraint being false to avoid infinities.

@dustinvtran you can use this Delta/Bernoulli pattern to express conditioning on any Boolean proposition being true, so conditioning in Pyro is in principle as expressive as conditioning in Church, but of course this isn't very efficient because there's no extra information inference algorithms can exploit to improve their chances of satisfying the constraint. You're right that many interesting models don't have tractable densities, so implementing less naive versions of this pattern with ABC likelihoods or discriminators is high on our roadmap.

Can you provide a snippet of how that works? For example, say eps ~ N(0, 1) for a 1-dimensional noise, followed by x = tanh(eps * W), where W is a 1 x 2 trainable matrix. We observe one 2-dimensional data point x. I don't totally follow how to infer eps or estimate W.

Oh, got it. In this example, you mean literally doing inference over the joint p(eps, x) = N(eps | 0, 1) I[ x == tanh(eps * W)] and variational distribution q(eps | x).

In this example, you mean literally doing inference over the joint p(eps, x) = N(eps | 0, 1) I[ x == tanh(eps * W)] and variational distribution q(eps | x)

Yep, although a hard constraint x == y on a continuous-valued distribution is false a.s. so in this case you could use a Normal with mean y and very small variance as your observation distribution. This is a very basic version of an ABC likelihood with Gaussian kernel implemented manually within a model.

So I guess the takeaway re:universality is that the condition operator holds for continuous non-invertible programs, but the algorithm itself fails because the indicator would almost surely not hold. A Gaussian kernel could work for a naive algorithm in practice although it's not the same program.

the algorithm itself fails because the indicator would almost surely not hold

Yeah, see "Running Probabilistic Programs Backwards" for an interesting discussion of the problem of conditioning on rare or complex events in a PPL.

I tried this

def add_one_or_two(guess):
    init = Variable(torch.Tensor([2]))
    choice = pyro.sample("choice", dist.categorical, ps=guess,vs=[False,True])
    if choice:
        outcome = init + 1
    else:
        outcome = init + 2
    return pyro.sample("outcome",dist.delta, outcome)

guess = Variable(torch.Tensor([0.5,0.5]))
conditioned = pyro.condition(
    add_one_or_two, data={"outcome": Variable(torch.Tensor([4]))})
marginal = pyro.infer.Marginal(
    pyro.infer.Importance(conditioned, num_samples=100), sites=["choice"])

marginal(guess)

but this gives me

RuntimeError: invalid argument 2: invalid multinomial distribution (sum of probabilities <= 0) at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/TH/generic/THTensorRandom.c:230

Any ideas?

It seems to work to follow @eb8680 's suggestion and add a little noise to the output:

def add_one_or_two(guess):
    init = Variable(torch.Tensor([2]))
    choice = pyro.sample("choice", dist.categorical, ps=guess,vs=[False,True])
    if choice:
        outcome = init + 1
    else:
        outcome = init + 2
    return pyro.sample("outcome", dist.normal, outcome, 0.1 * ng_ones(1))

guess = Variable(torch.Tensor([0.5,0.5]))
conditioned = pyro.condition(add_one_or_two, data={"outcome": Variable(torch.Tensor([4]))})
marginal = pyro.infer.Marginal(pyro.infer.Importance(conditioned, num_samples=100), sites=["choice"])

marginal(guess)
{'choice': array([False], dtype=bool)}

btw extending the current implemented algorithms to include one that can deal with implicit models (ie observing a stochastically computed value from a function that doesn't come with a scoring function) is one of our next todos. basically, you can use a discriminator as an estimator for the likelihood ratio needed in the elbo. (@dustinvtran and @karalets have both done things like this in papers!) imho this is the right way to extend pyro (as an optimization-focussed ppl) to have something like the condition operator of Church etal.

The paper mentioned by @ngoodman is "Hierarchical Implicit Models and
Likelihood-Free Variational Inference" (I believe), available on ArXiv

Even if we can model them with delta or normal with small variance samples, observations that are deterministic transformations of latent variables have another consequence when performing variational inference.

Assume the model, where we are interested in learning something about mu given observation x = x0. (Assume a is known and f is a deterministic function.)

z ~ Normal(mu, a)
x ~ Normal(f(z1), sigma=epsilon, obs=x0)

In the guide, the latents can be modeled as being sampled based on the parameters or as small deviations of the observed values.

Either the guide looks something like

mu = param
z ~ Normal(mu, a)

or
z ~ Normal(approx_f_inverse(x0), sigma=epsilon)

In the first case, because x0 has negligible probability given guide-sampled z, the convergence is very slow.

The second case doesn't allow learning about mu at all because the sample z doesn't depend on mu.

How can we handle a situation as the one above? Does the likelihood free stuff mentioned above handle this?

Any pointers on how inference in the above model can be accomplished with Pyro currently?

I had a similar problem, with a variable

 x = pyro.sample("x",  dist.Normal(0., 1.))
 w = 3.
 y = pyro.sample("y", dist.Delta(x*w), obs=x*w)

Notice the use of sample as opposed to sampling from a normal distribution of mean _x*y_ and small variance. The difference between the two approaches would be in the guide (for a variational method, which I am using). Using the Delta function, there is no need for a corresponding guide sample. After all, y is observed. The guide must contain the corresponding sample variable if the Normal distribution is used.

A related question is whether a deterministic function of a sampled variable can be considered as being observed?

Thanks.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

martinjankowiak picture martinjankowiak  路  3Comments

neerajprad picture neerajprad  路  4Comments

robsalomone picture robsalomone  路  4Comments

neerajprad picture neerajprad  路  5Comments

jpchen picture jpchen  路  5Comments