Pyro: Make GP module more versatile

Created on 1 Mar 2018 · 14Comments · Source: pyro-ppl/pyro

Continue to #702, the following steps are necessary to make GP module more accessible.

[x] Support multidimensional output (currently, the output y is assumed to be 1D). This should not be complicated I guess.
[x] Add more likelihoods: Bernoulli, Multiclass-classification,...
[x] Support batching (need reference for this, is an option to change X, y enough?)
[x] Use new style of transform with pyro.param
[ ] Split/revise the current tutorial into GPR and SGPR
[x] Go over all the docs of GP module, polish it!
[x] Add tutorial on Bayesian Optimization

After 0.2.0 release,

[x] Support mean_function
[x] ~Add a tutorial on creating kernels, warping with neural network~ Already add docs for this, and have an example which uses it on MNIST.
[x] ~~Add a tutorial on classification (SVGP) (banana data, replicate original paper).~~ Instead, we have an example on MNIST
[x] Implement whitening transform, which is said to aid optimization from GPflow authors. Yes, it helps optimization!
[x] ~Implement Gaussian-Hermite quadrature to compute the expectation of likelihood.~ This might be not necessary because doing MC is a better fit for Pyro framework.
[x] Benchmark with other frameworks: performance is same, for speed, gpflow CPU < PyroGP CPU ~ gpflow GPU < PyroGP GPU (x < y means x is faster than y)
[x] Add a regression test.
[x] Implement MC version of GP models.
[x] Implement GPLVM for dimensional reduction.

Future,

[x] Explore deep GP.

discussion documentation enhancement

Source

fehiepsi

Most helpful comment

@fehiepsi the GP library is looking great! As we move closer to the Pyro 0.2 release, I would strongly encourage you to focus on polish, performance, and documentation in addition to the features identified above - I'd hate to see your work go underappreciated. Here are some more detailed recommendations you might consider:

adding Bayesian hyperparameter selection with HMC/NUTS: in addition to being necessary for good GP performance in practice, this will help you stress-test the design of the GP API and make sure it interfaces seamlessly with the other parts of Pyro. I'm sure you're already working on or at least thinking about this, but I thought I'd mention it for completeness. I don't think there are many other algorithmic features missing beyond this and the others mentioned in this issue that are critical for the release.
adding more self-contained internal documentation: the GP documentation is still rather thin. For example, the docstring of SparseGPRegression says that it implements three approximation methods, but does not give any further mathematical or algorithmic information or describe their strengths and weaknesses or their relationships to one another. You don't need to be too exhaustive or pedagogical, but a useful heuristic might be to imagine you're writing for someone who learned much of what they know about probabilistic machine learning from Pyro's documentation and tutorials :)
splitting up your GP tutorial into smaller units: the Edward GP tutorial is a great example because of how little background knowledge it assumes and how simple the problem being solved is, and how high the ratio of explanation to code is. You did a great job with your GP tutorial, but it's still a bit terse beyond the introduction. I'd recommend splitting it into separate self-contained tutorials for each type of model (GPR, SGPR, SVGP, VGP) that each assume no background knowledge about GPs and briefly explain and motivate the mathematical details of the algorithm - you can use the same introductory section in each one if you want.
adding a few more simple, self-contained examples and tutorials that use the other models, kernels and likelihoods: Ideally you'd have paired examples and tutorial notebooks for a few different problem types (time series, spatial data, classification, larger datasets, basic Bayesian optimization, etc.) that use simple real-world datasets (e.g. from observations), are largely self-contained in terms of explanation and can double as integration tests. You won't need to do all these yourself from scratch - it looks from #883 like @tobyclh has the beginnings of a good BO tutorial, and @fritzo is experimenting with GP time series models that we could convert into a tutorial at some point. This can overlap with the previous point.
Profiling and benchmarking against GPflow and maybe GPy: It's very likely you'll end up finding performance and numerical issues or missing features upstream in Pyro and PyTorch (e.g. in the PyTorch LBFGS optimizer) and the sooner you do the sooner we can help fix them, ideally before the PyTorch 0.4 release :) You might also consider adding a couple of performance regression tests now that #858 has been merged.

Writing all that extra documentation is going to be a bit of a pain, but you can reuse a lot of content across points 2, 3, and 4 and it's by far the most valuable way to spend your time in terms of attracting users and contributors and comparing favorably to other libraries. If you want, I and others (@ysaatchi and @fritzo ) can help with selecting examples and making a more concrete release plan.

eb8680 on 13 Mar 2018

❤2 👍1

All 14 comments

Another extension which ought to be straightforward with pyro is transforming the inputs using neural networks before passing them to the GP. This allows for much more flexible kernel learning (cf "Deep Kernel Learning" papers)

ysaatchi on 3 Mar 2018

@ysaatchi Can it be obtained by using Warping kernel in #831?

fehiepsi on 3 Mar 2018

Yes, but the f(.) should have trainable parameters. What does q(.) do?

ysaatchi on 3 Mar 2018

@ysaatchi With Pytorch, we can create any nn.Module with parameters (nn.Parameter or pyro.param(...)) but still callable ;). q is a polynomial acting on the output of kernel. It appears in Section 6.2 of Bishop book together with Exponent. They play similar roles.

fehiepsi on 3 Mar 2018

Awesome @fehiepsi so it's pretty much already done, great work! :)

ysaatchi on 3 Mar 2018

adding Bayesian hyperparameter selection with HMC/NUTS: in addition to being necessary for good GP performance in practice, this will help you stress-test the design of the GP API and make sure it interfaces seamlessly with the other parts of Pyro. I'm sure you're already working on or at least thinking about this, but I thought I'd mention it for completeness. I don't think there are many other algorithmic features missing beyond this and the others mentioned in this issue that are critical for the release.
adding more self-contained internal documentation: the GP documentation is still rather thin. For example, the docstring of SparseGPRegression says that it implements three approximation methods, but does not give any further mathematical or algorithmic information or describe their strengths and weaknesses or their relationships to one another. You don't need to be too exhaustive or pedagogical, but a useful heuristic might be to imagine you're writing for someone who learned much of what they know about probabilistic machine learning from Pyro's documentation and tutorials :)
splitting up your GP tutorial into smaller units: the Edward GP tutorial is a great example because of how little background knowledge it assumes and how simple the problem being solved is, and how high the ratio of explanation to code is. You did a great job with your GP tutorial, but it's still a bit terse beyond the introduction. I'd recommend splitting it into separate self-contained tutorials for each type of model (GPR, SGPR, SVGP, VGP) that each assume no background knowledge about GPs and briefly explain and motivate the mathematical details of the algorithm - you can use the same introductory section in each one if you want.
adding a few more simple, self-contained examples and tutorials that use the other models, kernels and likelihoods: Ideally you'd have paired examples and tutorial notebooks for a few different problem types (time series, spatial data, classification, larger datasets, basic Bayesian optimization, etc.) that use simple real-world datasets (e.g. from observations), are largely self-contained in terms of explanation and can double as integration tests. You won't need to do all these yourself from scratch - it looks from #883 like @tobyclh has the beginnings of a good BO tutorial, and @fritzo is experimenting with GP time series models that we could convert into a tutorial at some point. This can overlap with the previous point.
Profiling and benchmarking against GPflow and maybe GPy: It's very likely you'll end up finding performance and numerical issues or missing features upstream in Pyro and PyTorch (e.g. in the PyTorch LBFGS optimizer) and the sooner you do the sooner we can help fix them, ideally before the PyTorch 0.4 release :) You might also consider adding a couple of performance regression tests now that #858 has been merged.

eb8680 on 13 Mar 2018

❤2 👍1

Whoa!!! @eb8680 Thank you so much for your appreciation and very detailed recommendations!

You are right that I intend to apply HMC for GP >"<. That was the reason I asked for information about current status of HMC (after seeing the simplicity of HMC implementation, I really like it and I come to read and implement NUTS). However, applying HMC for GP is not my priority right now. I will address it after 0.2 release (it should not be so complicated).

What's in my mind was that I will spend a week before releasing Pyro 0.2.0 to add more documentations, modify tutorials,... The Edward tutorial is so nice. As you mentioned that Pyro 0.2.0 will be released soon, I will start to split and add a few tutorials on kernels and classification (maybe) from this weekend. It would be very helpful for me if you could provide me some information about the release plan of Pyro. :)

Adding more likelihoods should be my priority now. However, I was a bit lazy to code last weekend so I just read papers. When seeing that doing BO is quite naturally (aside some issues such as: API design, discrete parameters), I thought that it would be helpful for other contributors if I make a tutorial for it. I also want to see if the current implementation of GP is working well. It is quite straightforward and will be finished soon. Then I will work on adding more likelihoods.

For profiling and benchmark, I follow almost every related pull requests from @neerajprad and still tried to learn from him. After adding more likelihoods and making some tutorials, it will be my priority.

fehiepsi on 13 Mar 2018

What's in my mind was that I will spend a week before releasing Pyro 0.2.0 to add more documentations, modify tutorials,...

Based on my experience with the Pyro launch, I think you might be significantly underestimating the time it will take to do this, which is why I'm encouraging you to start sooner. It's unlikely that you'll be able to work on a BO module at the same time (although I agree that a simple standalone BO tutorial is a good idea) and have both ready and polished in time for the 0.2 release.

It would be very helpful for me if you could provide me some information about the release plan of Pyro. :)

Our schedule is dependent on the PyTorch 0.4 release, but we may pin a release version before that. Stay tuned!

eb8680 on 13 Mar 2018

It's unlikely that you'll be able to work on a BO module at the same time

Yup, I didn't intend to work on BO around this time, and not in near future (mainly because designing its API would be painful).

I agree that it is better to start working on docs sooner. ;)

fehiepsi on 14 Mar 2018

@fehiepsi It would also be nice to reorganize modules to make importing a bit easier:

move Parameterized into a new module, say pyro/contrib/gp/util.py
import all submodules into pyro/contrib/gp/__init__.py

This way users can easily access the whole module, e.g.

import pyro.contrib.gp as gp

kernel = gp.kernels.RBF(...)
model = gp.models.GPRegressor(...)

(I'm starting to use your GP code for time series prediction, and wished I could do this :smile:)

fritzo on 14 Mar 2018

👍1

Just want to leave a message here for interested persons:

After working on implementing various likelihoods, I think that I have got a better understanding of GP. As a consequence, I can see that implementing MCMC version for GP, GPLVM, supporting deep GP are quite straightforward with some minor changes to GP models' API. So I decide to refactor GP module a bit to support these features before working on any tutorial.

fehiepsi on 23 Mar 2018

👍1

@fehiepsi do you mean you're implementing those features now or just refactoring the GP API to support implementing them in the future?

eb8680 on 23 Mar 2018

@eb8680 At least for MCMC and deep GP, I think that we don't need to implement. It is enough to support them and give examples in tutorials:

MCMC: HMC will run on e.g. gpr.model(), and after collecting samples from HMC, we use gpr.kernel.fix_param(...) to set parameters before inference gpr.forward(Xnew). We don't need to make separate models like GPMC or SGPMC as in GPflow. (previously, what makes doing MCMC on gp models complicated is people need differentiable Cholesky decomposition, tensorflow and pytorch both support it so I guess there will be no problem when using HMC on GP in Pyro).

For deep GP, a hidden GP layer is a GP model with data output y=None, so we just need to handle that case (no obs=y in self.model(...)). Users can construct any network structure as they want in the training step. For inference (self.forward()), I guess they have to handle things if they use Variational models (maybe also straightforward, I am not sure yet).
For GPLVM, it needs to implement a bit: make X a parameter (to be learned, set priors). We can fully support it after 0.2 if it turns out complicated.

fehiepsi on 23 Mar 2018

Closed due to most targets are more or less completed and @martinjankowiak is revising gp tutorial in #1039. Later changes will be based on users' requests.

fehiepsi on 18 Apr 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Getting the posterior predictive samples for a model

neerajprad · 4Comments

Updating intro tutorials

eb8680 · 4Comments

[FR] Improve save_visualization's functionality

robsalomone · 4Comments

[bug] Continuous autoguides do not take device into account

ahmadsalim · 3Comments

Autoguide inference is incorrect for VonMises distribution

fritzo · 4Comments