Turing.jl: Add L-BFGS to Turing

Created on 1 Feb 2018 · 10Comments · Source: TuringLang/Turing.jl

The Optim.jl package provides an implementation of BFGS, perhaps we can integrate it with Turing?

inference-method

Source

yebai

Most helpful comment

We don't currently have any gradient definitions for Distributions, but they are easy to add and we'd be very happy to have them in Flux. Let me know if you have any trouble getting that set up; I'll also be around at JuliaCon and would be happy to meet up over this stuff.

I agree with @willtebbutt and @MikeInnes that we should talk at JuliaCon. Turing and Flux together can offer a much stronger support for ML: while Flux focuses more on Neural Networks (NN) models and optimisation based learning, Turing focuses more on Probabilistic Machine Learning (PML) and Bayesian inference.

In particular, I think it would be nice if Turing and Flux can

share AD and optimisation procedures
share GPU and parallel computing libraries
have compatible syntax for model and inference compositions
have example models that combine NN/DL and Bayesian Inference

yebai on 30 Jul 2018

👍3

All 10 comments

Note: link for LBFGS in Optim.jl - http://julianlsolvers.github.io/Optim.jl/stable/algo/lbfgs/

xukai92 on 19 Feb 2018

In what context shall L-BFGS (a memory efficient quasi-Newton method) be added to Turing ? Is Turing now supporting some sort of optimisation paradigm ?

emilemathieu on 12 Mar 2018

The Flux.jl package has implementations for several popular optimisation algorithms in ML. Perhaps we can consider to re-use them in Turing:

http://fluxml.ai/Flux.jl/stable/training/optimisers.html

@xukai92 @ChrisRackauckas

yebai on 24 Jul 2018

👍2

I believe it just added some higher order methods too @mikeinnes

ChrisRackauckas on 24 Jul 2018

I completely agree with @yebai that we should try to make use of Flux's optimisers. This is perhaps a useful conversation to have at JuliaCon. It's not clear to me whether making this work would simply be a case of using a Turing.jl VarInfo to construct a Flux.jl param, and then adding a couple of methods to make stuff work with the new optimiser interface, or whether it will be a bit trickier for some reason.

willtebbutt on 26 Jul 2018

The new interface seems to be compatible with what Turing.jl's internals can provide. Though until they adapt a generic AD I would assume their grad may have problem with Distributions?

xukai92 on 27 Jul 2018

We don't currently have any gradient definitions for Distributions, but they are easy to add and we'd be very happy to have them in Flux. Let me know if you have any trouble getting that set up; I'll also be around at JuliaCon and would be happy to meet up over this stuff.

MikeInnes on 27 Jul 2018

👍1

I'll also be around at JuliaCon and would be happy to meet up over this stuff.

This would be very helpful.

willtebbutt on 27 Jul 2018

We don't currently have any gradient definitions for Distributions, but they are easy to add and we'd be very happy to have them in Flux. Let me know if you have any trouble getting that set up; I'll also be around at JuliaCon and would be happy to meet up over this stuff.

In particular, I think it would be nice if Turing and Flux can