Pyro: Future areas of work / improvement for HMC and NUTS

Created on 25 Apr 2018 · 14Comments · Source: pyro-ppl/pyro

Please create a separate issue if you are working on a major task (e.g. mass matrix adaptation, or parallel chaining), so that all task specific discussion is contained within that issue.

Enhancements

Minor:

[x] Deciding on a suitable number of warmup iterations. These are currently taken as input arguments, but adapt_step_size=True should have a reasonable default number of warmup iterations, if not specified by the user. e.g. we could default to 50% (as Stan), in which case if num_samples=100 then we would automatically run 100 warmup iterations if none are specified by the user.
[x] Ability to set target acceptance probability during step size adaptation - currently fixed at 0.8. This will be specially useful to bias the adaptation towards smaller step sizes to explore problematic posteriors with regions of high curvature.
[x] Ability to set max_tree_depth to trade off accuracy for speed.
[ ] Detect divergent transitions, and log them for analysis. Throwing a NaN here during sampling as we might do now with validation check enabled, is not very useful to the end user.
[x] Implement a progress bar so that the logging info does not clutter the screen (specially in notebooks).

Major:

[x] Multiple parallel chains for HMC (and/or NUTS). For HMC, this could be as simple as using poutine.broadcast to run parallel chains, similar in spirit to parallelizing ELBO computation over num_particles in #1176, or (preferably) use torch.distributed to implement a more general (applicable to NUTS) and scalable solution.
- [x] Adapt the mass matrix during the warmup phase, alongside the step size parameter. Currently this is assumed to be a diagnormal.
- [x] Better initialization strategies, ~e.g. generating the initial trace after running ADVI, MAP~ (EDIT: this can be done by the user independently and the trace so generated can be specified via initial_trace). In addition, providing the option to the user to specify an initial trace to the NUTS/HMC kernel.
- [x] Use multinomial instead of slice sampling in NUTS.
- [x] Enumerate over discrete latents. #1128
- [ ] (Low priority) Add support for other MCMC algorithms like Gibbs sampling, and use these in conjunction with the HMC/NUTS kernel to allow sampling from models with discrete latent variables.
- [x] Parallel chains on CUDA. NOTE: This feature is partially supported. We need to hold traces in the workers until it terminates or the main process does not need them any more. But this way will somehow violate the main reason for using generator: to resolve memory issue for large model. A better mechanism for when to store traces, when to clear it in workers should be implemented.

Diagnostics / Results

[x] Other logging: #1175, #1196
[x] Utilities to provide inference summary, or summarized stats for posterior over latent sites. For instance, examples/baseball.py implements some summary utilities, but it will be great to have a consistent interface for different inference algorithms, and not have to rely on pandas. PyMC is considering using xarray as a universal format for inference results, including PyStan results.
[x] ~Plotting posterior over latents, like pymc3.plots.traceplot.~ Plotting posterior is straightforward now with the marginal() method.
[x] Add support for convergence diagnostics like Effective Sample Size for each latent site, Gelman-Rubin convergence diagnostic (once parallel chains are implemented).

JIT

[x] Explore using PyTorch JIT to make HMC models faster, specially smaller models where the Python runtime overhead dominates over any torch tensor computations. See #1063. Work started in #1299.

enhancement help wanted

Source

neerajprad

👍4

All 14 comments

@fehiepsi - Please feel free to edit / add to these.

neerajprad on 25 Apr 2018

@neerajprad How about the ability to set different MCMC algorithms to different variables? This will be helpful when we have Metropolis to deal with discrete variables. I don't figure out how to achieve it yet so can not give an evaluation for its amount of time.

fehiepsi on 25 Apr 2018

@neerajprad How about the ability to set different MCMC algorithms to different variables? This will be helpful when we have Metropolis to deal with discrete variables. I don't figure out how to achieve it yet so can not give an evaluation for its amount of time.

Added that. I think that will be quite useful, but we will need to implement other MCMC kernels first.

neerajprad on 25 Apr 2018

I'd love to contribute re: mass-matrix adaptation and I actually have some preliminary work on it already. But I've run into an issue and I'm not sure how you'd like to proceed in terms of the design. Should I discuss here, or open an issue specifically for the mass-matrix adaptation?

LoganWalls on 11 May 2018

👍1

I'd love to contribute re: mass-matrix adaptation and I actually have some preliminary work on it already.

That's great to hear. I would suggest opening a separate issue which we can link to from here, so that it doesn't have to deal with all the noise from this master task.

neerajprad on 11 May 2018

👍1

I'd like to suggest adding one or more stochastic gradient approaches (ex: Stochastic Mini-Batch HMC, Stochastic Gradient Langevin Dynamics, etc.) to this list. There does seem to be some concern about the theoretical properties of these algorithms (as seen in this PyMC3 discussion) but I think their potential in applied applications a least merits consideration.

LoganWalls on 14 May 2018

@neerajprad @jpchen @rohitsingh0812 FYI PyMC devs are considering using xarray as a format for inference results of PyMC and PyStan. This seems like a good decision to me, and it would be nice if we could aim for an interchangeable format.

fritzo on 23 May 2018

👍1

it would be nice if we could aim for an interchangeable format

What does this buy us? Is this meant to allow us to use arviz for visualization? I think it would be great if we wrote something to convert TracePosteriors into whatever summary format they have in mind, but I don't see a good reason to commit to that as the sole representation of inference output.

eb8680 on 23 May 2018

I think since xarray supports numpy, it should be relatively straightforward for us to convert the results of TracePosterior to that format and get any summary/plotting utilities. I like PyMC's traceplot, and it would be great to have access to that without having to develop all of that within Pyro.

neerajprad on 23 May 2018

I think it would be great if we wrote something to convert TracePosteriors into [xarray]

Yeah, the idea is to leverage the work of other teams who are converting PyStan and PyMC output into a standard format built on xarray. This will enable comparison across PPL systems and algorithms.

fritzo on 23 May 2018

@cfperez - I have updated the task list here, specific to HMC/NUTS. We don't have a separate issue for visualization, and something like traceplot will be useful for all inference algorithms, not just HMC. Some relevant discussion here. I think using arviz would require us to convert torch.Tensor to a pandas dataframe, or xarray. That needs a more involved discussion (on rolling out our own solution, vs. tying ourselves to an external data format / dependency), so please feel free to create a new issue if you would like to work on this!

neerajprad on 10 Aug 2018

We already have a separated Gibbs sampling issue. Multi chain in CUDA is possible now with PyTorch 1.1.0. Divergence info (which is NUTS tree diverging flag) can be added easily but it seems not important. Feel free to make a separate FR if it is necessary.

fehiepsi on 7 May 2019

Hi @fehiepsi can you explain a little bit why you think the divergence diagnostics are not important? They seem to be critical for telling if NUTS is working properly. Do we have any other means to check convergence in Pyro? Right now the only thing I can find are the effective number of samples and R hat, but none of them are HMC specific.

riversdark on 29 Jun 2019

@riversdark That's just my feeling. I haven't read much literature on divergence diagnostics. We can easily add it (we just need to decide where it should go: progress bar or store it). I'll open a FR for it.

fehiepsi on 30 Jun 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Binarized MNIST in VAE tutorial

tristandeleu · 3Comments

Autoguide inference is incorrect for VonMises distribution

fritzo · 4Comments

multi-chain mcmc fails for PriorKernel

fehiepsi · 4Comments

Perf bug in multivariate normal due to inefficient .expand

neerajprad · 4Comments

MCMC with parallel chains get stuck in jupyter notebook on Ubuntu

neerajprad · 4Comments