Pymc3: DOCS: example of Gaussian process regression

Created on 10 Sep 2016  ·  23Comments  ·  Source: pymc-devs/pymc3

There is not-yet-fulfilled promise in docs about Gaussian smoothing:
https://pymc-devs.github.io/pymc3/notebooks/GP-smoothing.html

It is important to note that we are not dealing with the problem of interpolating the function y=f(x)y=f(x) at the unknown values of xx. Such problem would be called “regression” not “smoothing”, and will be considered in other examples.

;)

Please, provide a simple example.

Most helpful comment

We have merged GP support now.

All 23 comments

This would be a good contribution. I've never built one though - does anyone have one to hand - perhaps @fonnesbeck. Alternatively @PtrPiotr if you get one working submit a PR :)

Here is a minimal GP regression example (with no commentary; sorry):

https://nbviewer.jupyter.org/gist/AustinRochford/96d7eb6256692b34661b0000e86ee4e0

I don't really have time to turn this into a presentable doc for a few weeks, so feel free to run with it if you are so inclined.

I have a goal of reproducing the birthdate analysis from BDA3 in PyMC3 but have not yet done so.

@AustinRochford This is awesome, thanks for sharing.

Thanks @twiecki. It would be nice to try to do posterior predictive sampling here, but I kept running into errors with the shape of y_obs changing.

Can you post the version that's breaking?

We really need to create a GP submodule to handle all of that overhead, much like the one Anand built for PyMC2. Its been on my list for a long time.

@twiecki Here's a notebook with the PPC sampling problem I've run into https://gist.github.com/AustinRochford/134f71b320d46411354a7398208fd278

@fonnesbeck I'd be happy to collaborate on a GP submodule in a few weeks once my schedule clears up.

@AustinRochford and @fonnesbeck I believe this https://github.com/pymc-devs/pymc/blob/master/pymc/gp/gp_submodel.py is the submodule in PyMC2

idk there are a handful of GP frameworks in python. a bunch of them can plug into solvers. however, they are limited (at least computationally) in what you can do with the kernels. this would be a big advantage for pymc3.

Yet another example of a GP using PyMC3

Thanks for that link @twiecki, I knew i had seen it somewhere, but couldn't dig it up (didn't think to look in other branches).

I just put it there recently, also had to dig it up from a tweet way back :).

Hi guys, I'm also very interested in a GP submodule for pymc3. I made a fork to experiment with how a submodule might shake out.

Other GP packages, specifically thinking of GPy and GPFlow as a bit more machine learning oriented emphasizing fast approximations over MCMC routines, and may be hard to interface with other code when a GP is part of a larger statistical model. I think fully Bayesian GPs is a niche pymc3 could fill nicely.

Also, the fact that pymc3 is backed by theano is a big advantage in my mind. Many different kernels can be experimented with without the implementation complexity induced by coding the gradients.

What I see as components (feel free to add or subtract):

  • a large library of composable kernels
  • elliptical slice sampling
  • easily use glm for mean function
  • more?

Sounds great, the new UserModel should come in handy. I wonder if that'd be better as a separate package, certainly should start that way.

Why elliptical slice sampling?

I didn't know about UserModel I'll check into that. Elliptical slice sampling for directly sampling the latent function from the multivariate normal GP prior. In gaussian models you can usually integrate it out first, but you may not want to, or can't for a non-gaussian likelihood.

Here's what I have so far: https://github.com/bwengals/pymc3/tree/gaussian-processes
Example script: examples/basic_gp.py

However I am using Theano 0.9 so I can use cholesky/solve_triangular

That's a great start! We definitely need the cholesky option to our normal.

NUTS or our other existing inference algorithms should do just fine, no?

Yes, the kernel library of GPFlow uses tensorflow, so is extremely easy to adapt to theano. Also it has basically the same API as GPy, which from what I can tell is the most popular GP library.

I haven't experimented in depth --- but when I explicitly sample the latent function NUTS hangs on me, I'm not sure why. I have a notebook not ready for commit with this model. A tuned HMC it works alright though. Is there a reason adding elliptical slice sampling as a step method is a bad idea?

No reason not to add it, would be a great addition.

I had good results with SVGD on the branch over here: https://github.com/pymc-devs/pymc3/pull/1549. Does NUTS also stall if you run with ADVI intialization? I.e. just calling pm.sample(n_iter).

@bwengals disagree that GPy is ML-oriented. The fact that it emphasizes kernel engineering and plotting makes it not ML-oriented. These features help modeling specifically.

ok i think it's time to separate out the GP module discussion.

Sorry, didn't mean to commandeer the thread! Where would be a good place for a discussion?

@twiecki Haven't tried that! Will look into it.

@majidaldo that's a good point, definitely wasn't trying to criticize GPy, just motivate what I think a GP lib in pymc3 could possibly offer that isn't already well accomplished or within the scope of GPy

We have merged GP support now.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

junpenglao picture junpenglao  ·  5Comments

springcoil picture springcoil  ·  3Comments

mmargenot picture mmargenot  ·  6Comments

aakhmetz picture aakhmetz  ·  4Comments

jonathanhfriedman picture jonathanhfriedman  ·  6Comments