In NIPS 2016 the article "Sticking the Landing: A Simple Reduced-Variance Gradient for ADVI" was presented. The authors propose an ELBO gradient estimator with less variance for some cases. They call it "path derivative gradient estimator". This may be a good fit for PyMC3.
This is the abstract of the article:
Compared to the REINFORCE gradient estimator, the reparameterization trick
usually gives lower-variance estimators. We propose a simple variant of the
standard reparameterized gradient estimator for the evidence lower bound that
has even lower variance under certain circumstances. Specifically, we decompose
the derivative with respect to the variational parameters into two parts: a path
derivative and the score function. Removing the second term produces an unbiased
gradient estimator whose variance approaches zero as the approximate posterior
approaches the exact posterior. We propose that the removed term has arbitrarily
high variance when the variational posterior has a complex form, as when using
adaptive posteriors such as given by normalizing flows or stochastic Hamiltonian
inference.
@noe Thanks for sharing!
CC @taku-y @ferrine
Sounds really interesting, thank you!
Now it's time to figure out how to implement it in theano out of the box, I have no ideas yet, lets brainstorm:)
useful link https://gist.github.com/benanne/9212037
that's done
Wow @ferrine just wow :)
That's the whole implementation?
Does anyone know how to derive the last term of eq. (11) in the paper? Without knowing that, I'm not sure the implementation is correct.
@taku-y when we take gradients of log_q_W(posterior(\phi, \epsilon)) we apply chain rule as log_q_W is parametrized by \phi too. The first term is backpropagated through posterior, that's the path part of it. And the second comes directly from parameters of log_q_W function. What I do is simply setting to zero the second part with theano.gradient.zero_grad.
@ferrine Thanks for your quick response. I will check it.
Most helpful comment
that's done