Pyro: Implement a GaussianProcess

Created on 28 Sep 2017 · 15Comments · Source: pyro-ppl/pyro

branch

Adapt @ysaatchi 's pytorch implementation of GaussianProcesses, maybe as a pyro.module.

(Edited to not make GP a Distribution object)

enhancement

Source

fritzo

Most helpful comment

@fritzo @ysaatchi Here's a naive Pyro implementation of GP regression via variational inference with inducing points. This should just work with our inference algorithms, at least after using Cholesky instead of torch.inv.

Model (prior and likelihood):

def model(x, y):
    cov = k(x, x)
    f = pyro.sample("f", dist.multivariate_normal, zeros(x.size()), cov)
    return pyro.observe("y", dist.diagnormal, y, f, ones(f.size()))

Approximate posterior:

def guide(x, y):
    z = pyro.param("z", ...)  # inducing points
    m = pyro.param("m", ...)
    sigma = pyro.param("sigma", ...)
    kxz = k(x, z)
    kzx = k(z, x)
    kxx = k(x, x)
    kzz_inv = torch.inv(k(z, z))
    mean = torch.mm(torch.mm(kxz, kzz_inv), m)
    cov = kxx - torch.mm(torch.mm(kxz,
                                  kzz_inv - torch.mm(kzz_inv, torch.mm(sigma, kzz_inv))),
                         kzx)
    f = pyro.sample("f", dist.multivariate_normal, mean, cov)
    return f

Inference:

inference = KL_QP(model, guide, pyro.optim.adam)
for t in range(100):
    elbo = inference.step(data["x"], data["y"])

eb8680 on 29 Sep 2017

👍3

All 15 comments

@dustinvtran Did you ever try to implement GPs as distributions in Edward? It seems really tricky to implement a distribution that can be partially observed (e.g. observe one coord of a MVN variable). I guess one option is to deterministically project a full sample down to one coordinate, then observe that single coordinate. However this won't be possible if we fuse observe and sample as suggested by @eb8680 ...

fritzo on 29 Sep 2017

It seems like GP and Multinomial are simply indexed distributions in that some of their parameters are very late binding (position and total_count, respectively). Whereas the XRP/ERP distinction is between distributions with/without memory, the indexed/non-indexed distinction is between distributions with/without parameters that always need to be specified when calling .sample() (i.e. they are specified even for singleton instances). I think our current conventions would support GP if we simply omitted the position parameter from _sanitize_params(). So the signature for GP.sample() would be like

def __init__(self, cov=None, *args, **kwargs):
    self.cov= cov
    ...

def sample(self, x, position, cov=None, *args, **kwargs):
    cov = self._sanitize_params(cov)
    ...

fritzo on 29 Sep 2017

Model (prior and likelihood):

def model(x, y):
    cov = k(x, x)
    f = pyro.sample("f", dist.multivariate_normal, zeros(x.size()), cov)
    return pyro.observe("y", dist.diagnormal, y, f, ones(f.size()))

Approximate posterior:

def guide(x, y):
    z = pyro.param("z", ...)  # inducing points
    m = pyro.param("m", ...)
    sigma = pyro.param("sigma", ...)
    kxz = k(x, z)
    kzx = k(z, x)
    kxx = k(x, x)
    kzz_inv = torch.inv(k(z, z))
    mean = torch.mm(torch.mm(kxz, kzz_inv), m)
    cov = kxx - torch.mm(torch.mm(kxz,
                                  kzz_inv - torch.mm(kzz_inv, torch.mm(sigma, kzz_inv))),
                         kzx)
    f = pyro.sample("f", dist.multivariate_normal, mean, cov)
    return f

Inference:

inference = KL_QP(model, guide, pyro.optim.adam)
for t in range(100):
    elbo = inference.step(data["x"], data["y"])

eb8680 on 29 Sep 2017

👍3

Also, GPyTorch seems nice, what about just wrapping that with a Distribution?

eb8680 on 29 Sep 2017

@dustinvtran Did you ever try to implement GPs as distributions in Edward?

It's relatively painless when using their marginal representation; see, e.g., http://edwardlib.org/tutorials/supervised-classification which is similar to @eb8680's snippet above. IIRC all GP software like GPy, GPflow, and GPstuff handle it this way.

Is something like gpmem more what you're looking for?

dustinvtran on 29 Sep 2017

@eb8680 your snippet looks good, we can just add the potrf's where needed and it's pretty close to the pyronic way of implementing a variational GP

ysaatchi on 29 Sep 2017

Is something like gpmem more what you're looking for?

Yes gpmem is what I was thinking, but @eb8680's approach looks simpler.

fritzo on 29 Sep 2017

Fritz and I both agree now that GPs should be a model with a guide, much like Eli's snippet. The code I wrote can be useful for providing machinery to specify differnet k(x, xp)'s.

e.g.

def covARD(x, xp, hypers):

    kk = [hn for hn in hypers.keys() if "log_signal_var" in hn] # accomodate suffixes for e.g. covSum
    assert len(kk) == 1, "signal variance not specified"
    signal_var = torch.exp(hypers[kk[0]])

    kk = [hn for hn in hypers.keys() if "log_lengthscales" in hn] # accomodate suffixes for e.g. covSum
    assert len(kk) == 1, "lengthscales not specified"
    lengthscales = torch.exp(hypers[kk[0]]) * 2.

    n, d = x.size()
    m, d = xp.size()

    assert len(lengthscales) == d, "invalid lengthscale hyper, needs to be of size (%i,)" % d

    xnorm = x.div(lengthscales)
    xp_norm = xp.div(lengthscales)


    cross_ = 2 * torch.mm(xnorm, xp_norm.transpose(0,1))

    x_sq = torch.bmm(xnorm.view(n, 1, d), xnorm.view(n, d, 1))
    xp_sq = torch.bmm(xp_norm.view(m, 1, d), xp_norm.view(m, d, 1))

    x_sq = x_sq.view(n, 1).expand(n, m)
    xp_sq = xp_sq.view(1, m).expand(n, m)

    res = x_sq + xp_sq - cross_
    res = signal_var.expand_as(res) * torch.exp(-0.5*res)

    return res

and

def covSumAdditive(x, xp, hypers):
    K = None
    n, d = x.size()
    assert len(hypers) == d, "invalid hypers for covSum, need a config per dimension" 
    for dd, kernel_dict in enumerate(hypers):
        if K is None:
            K = kernel_dict["K"](x[:, [dd]], xp[:, [dd]], kernel_dict["hypers"])
        else:
            K += kernel_dict["K"](x[:, [dd]], xp[:, [dd]], kernel_dict["hypers"])
    return K

ysaatchi on 29 Sep 2017

this kind of barefoot approach is great for a first pass but it's a bit unsatisfying. arguably, it's not really an abstraction of a gaussian process; rather it's probably fairer to call it a set of helper functions to construct covariances and mean functions. the user then reifies the model on some set of inputs and you get some explicitly typed gaussian model you can do variational inference in.

it's also unsatisfying that any conjugacy that might be there is left on the table.

also, the barefoot approach to including inducing points doesn't quite work. this is because the inducing points are only on the guide side and not the model side. so it can be thought of a variational distribution with auxiliary variables. constructing an appropriate elbo requires introducing an auxiliary distribution, which is a bit awkward in this context. in any case our current kl_qp doesn't support auxiliary distributions.

martinjankowiak on 29 Sep 2017

Martin, I think you are slightly wrong here. We can do this barefoot if we
consider the inducing points random variables samples in the model and have
a guide for them.
No need for auxiliary distributions to do a variational GP, I think you
confused yourself a bit.

On Fri, Sep 29, 2017 at 11:59 AM, martinjankowiak notifications@github.com
wrote:

this kind of barefoot approach is great for a first pass but it's a bit
unsatisfying. arguably, it's not really an abstraction of a gaussian
process; rather it's probably fairer to call it a set of helper functions
to construct covariances and mean functions. the user then reifies the
model on some set of inputs and you get some explicitly typed gaussian
model you can do variational inference in.

it's also unsatisfying that any conjugacy that might be there is left on
the table.

also, the barefoot approach to including inducing points doesn't quite
work. this is because the inducing points are only on the guide side and
not the model side. so it can be thought of a variational distribution with
auxiliary variables. constructing an appropriate elbo requires introducing
an auxiliary distribution, which is a bit awkward in this context. in any
case our current kl_qp doesn't support auxiliary distributions.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/uber/pyro/issues/166#issuecomment-333210748, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABVhLwLFkPYLnEBa5riFd3AGjxP2do0fks5snT4tgaJpZM4Pnfd-
.

karalets on 29 Sep 2017

oh my apologies i didn't realize that eli's code snippet doesn't explicitly have latent u's. yeah you can do that. although for smaller datasets it's probably easier to just introduce an explicit variational distribution for the f's (like in the edward example).

martinjankowiak on 29 Sep 2017

OK, @eb8680's snippet suggests that GaussianProcess should not be implemented as a Distribution object (like gpmem), and instead should be implemented as something else (what do we call a model,guide pair?). I've updated the issue description to not mention pyro.distributions.

fritzo on 2 Oct 2017

what do we call a model,guide pair?

"guided model"? ;)

we discussed at some point how a trained model+guide is an intermediate step toward a marginal distribution, and we should possibly have a method for packaging them up together. on the other hand, i'm not sure how much this buys us.

ngoodman on 2 Oct 2017

It could be used as a component in other models, but my feeling is that we
should have pretty standard wrappers for these things.

On Oct 2, 2017 8:53 AM, "ngoodman" notifications@github.com wrote:

what do we call a model,guide pair?

"guided model"? ;)

we discussed at some point how a trained model+guide is an intermediate
step toward a marginal distribution, and we should possibly have a method
for packaging them up together. on the other hand, i'm not sure how much
this buys us.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/uber/pyro/issues/166#issuecomment-333577509, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABVhL5Y_WJSvnY0QW3l5VUy8KZ45gX8Bks5soQbbgaJpZM4Pnfd-
.

karalets on 2 Oct 2017

Blocked by missing multivariate normal distribution. We can revisit this issue when that's ready.

eb8680 on 18 Dec 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Implement higher dimensional newton solvers

jpchen · 5Comments

Updating intro tutorials

eb8680 · 4Comments

Bug (?) in the shape of samples from auto-guided LKJCorr

null-a · 4Comments

pytorch broadcasting

martinjankowiak · 3Comments

Pyro minor release 0.3.1

neerajprad · 4Comments