If we added a NUTS Op to Theano with implementations in JAX (e.g. from numpyro) as well as C (here's an implementation https://github.com/alumbreras/NUTS-Cpp, or we can use the STAN one directly) we could get incredible speeds across different execution backends.
I'm willing to take this one on if you can provide some guidance. Could be a good Christmas project.
Great! @brandonwillard is the person with the most mature vision on this.
This is a really important implementation, so, yeah, if you need any input from me, ask away.
Otherwise, here's a template for one approach to converting our samplers to Theano Ops. It's basically just a quick outline of the Op creation process.
The questions that arise within this task are mostly about where/when to "Theano-ize" the NUTS sampler.
For instance, we could make one big NUTS Op that uses all the existing Python sampler code in PyMC3 (e.g. like the template in the link above). That work would mostly be about determining sufficient Theano inputs for such an Op (e.g. the log-likelihood graphs, the sampler parameters, etc., would all need to be Theano objects).
The other end of the spectrum would have us rewriting the core functionality in pymc3.step_methods.hmc using as many pre-existing Theano Ops as possible—and perhaps a few new ones when necessary. This approach is the ideal, since it allows us to apply all the existing Theano optimizations and C transpilation. Plus, this is how we could more seamlessly convert our samplers to JAX.
One of the difficulties with this approach is that our samplers uses some custom "helper" classes (e.g. pymc3.step_methods.hmc.quadpotential.QuadPotential) and those would need to be refactored into a form that's more convenient for Theano (e.g. either as Ops and/or a series of helper functions).
The reason for this is that one cannot easily put something like a QuadPotential instance into a Theano graph; not without creating custom theano.gof.type.Types, at least.
Basically, Theano graphs model lambda expressions only, and Theano doesn't offer the OO abstraction on top of that, so one simply has to reformulate the OO-centric parts of the sampler implementations.
Ok so if we do the core rewrite approach - step one would be to go through pymc3.step_methods.hmc and identify everything that needs to be re-written using a Theano Op, is that right? If I have a first go at this and put that list in here, probably we can look for any holes in it and then start knocking things down.
Ok so if we do the core rewrite approach - step one would be to go through
pymc3.step_methods.hmcand identify everything that needs to be re-written using a TheanoOp
Yes, but, before that even, we might want to clarify exactly how this whole thing should work and what the inputs and outputs should be.
For example, we could write a sample function that takes a map of TensorVariables—corresponding to a model's random variables—and their log-likelihood graphs (e.g. as given by a PyMC3 Model object), integer-typed Scalars for the number of samples (e.g. warm-up, burn-in), etc., and returns a list of TensorVariables with dimensions matching each random variable input by the number of requested samples.
Once those inputs and their Theano types are clear, we can start to walk through the PyMC3 code and get an understanding of exactly what needs to be done.
We can always take a cue from pymc3.sampling, but just remember that it's doing a lot more than we want/need to right now. The same goes for any of the samplers we reimplement in Theano: we only want to shoot for the most basic and important functionality right now. Actually, @eigenfoo's littlemcmc might be a better reference point, since it's a bit more streamlined than PyMC3's.
For that matter, we might be better off implementing a few simple Metropolis samplers first to see where the difficulties are (if any). Here's an old project that started doing just that. There's not much we can borrow from it except perhaps some examples of simple sampler loop logic (e.g. a Metropolis-Hastings sampler), but it's worth a look.
Also, since most people will be primarily interested in the performance of a Theano-based sampler, these simpler samplers should provide a clearer view of any bottlenecks that need to be addressed before moving on to HMC and/or NUTS.
Makes sense - in the absence of the ability to think critically (at this stage), about this, let me just get the requirements for the first step clear and I'll start working on it. What you're after here is
littlemcmc or elsewhere - no bells and whistles for nowSo just to be clear, the map that the sample function should take comprises TensorVariables as keys - what datatype should be used for the log-likelihood graph (the map values)?
Having looked through littlemcmc a bit, I think either I need closer help working on this or should be helping someone else.