Please create a separate issue if you are working on a major task (e.g. mass matrix adaptation, or parallel chaining), so that all task specific discussion is contained within that issue.
Minor:
adapt_step_size=True should have a reasonable default number of warmup iterations, if not specified by the user. e.g. we could default to 50% (as Stan), in which case if num_samples=100 then we would automatically run 100 warmup iterations if none are specified by the user.0.8. This will be specially useful to bias the adaptation towards smaller step sizes to explore problematic posteriors with regions of high curvature.NaN here during sampling as we might do now with validation check enabled, is not very useful to the end user.Major:
poutine.broadcast to run parallel chains, similar in spirit to parallelizing ELBO computation over num_particles in #1176, or (preferably) use torch.distributed to implement a more general (applicable to NUTS) and scalable solution.initial_trace). In addition, providing the option to the user to specify an initial trace to the NUTS/HMC kernel.examples/baseball.py implements some summary utilities, but it will be great to have a consistent interface for different inference algorithms, and not have to rely on pandas. PyMC is considering using xarray as a universal format for inference results, including PyStan results.marginal() method.@fehiepsi - Please feel free to edit / add to these.
@neerajprad How about the ability to set different MCMC algorithms to different variables? This will be helpful when we have Metropolis to deal with discrete variables. I don't figure out how to achieve it yet so can not give an evaluation for its amount of time.
@neerajprad How about the ability to set different MCMC algorithms to different variables? This will be helpful when we have Metropolis to deal with discrete variables. I don't figure out how to achieve it yet so can not give an evaluation for its amount of time.
Added that. I think that will be quite useful, but we will need to implement other MCMC kernels first.
I'd love to contribute re: mass-matrix adaptation and I actually have some preliminary work on it already. But I've run into an issue and I'm not sure how you'd like to proceed in terms of the design. Should I discuss here, or open an issue specifically for the mass-matrix adaptation?
I'd love to contribute re: mass-matrix adaptation and I actually have some preliminary work on it already.
That's great to hear. I would suggest opening a separate issue which we can link to from here, so that it doesn't have to deal with all the noise from this master task.
I'd like to suggest adding one or more stochastic gradient approaches (ex: Stochastic Mini-Batch HMC, Stochastic Gradient Langevin Dynamics, etc.) to this list. There does seem to be some concern about the theoretical properties of these algorithms (as seen in this PyMC3 discussion) but I think their potential in applied applications a least merits consideration.
@neerajprad @jpchen @rohitsingh0812 FYI PyMC devs are considering using xarray as a format for inference results of PyMC and PyStan. This seems like a good decision to me, and it would be nice if we could aim for an interchangeable format.
it would be nice if we could aim for an interchangeable format
What does this buy us? Is this meant to allow us to use arviz for visualization? I think it would be great if we wrote something to convert TracePosteriors into whatever summary format they have in mind, but I don't see a good reason to commit to that as the sole representation of inference output.
I think since xarray supports numpy, it should be relatively straightforward for us to convert the results of TracePosterior to that format and get any summary/plotting utilities. I like PyMC's traceplot, and it would be great to have access to that without having to develop all of that within Pyro.
I think it would be great if we wrote something to convert
TracePosteriorsinto [xarray]
Yeah, the idea is to leverage the work of other teams who are converting PyStan and PyMC output into a standard format built on xarray. This will enable comparison across PPL systems and algorithms.
@cfperez - I have updated the task list here, specific to HMC/NUTS. We don't have a separate issue for visualization, and something like traceplot will be useful for all inference algorithms, not just HMC. Some relevant discussion here. I think using arviz would require us to convert torch.Tensor to a pandas dataframe, or xarray. That needs a more involved discussion (on rolling out our own solution, vs. tying ourselves to an external data format / dependency), so please feel free to create a new issue if you would like to work on this!
We already have a separated Gibbs sampling issue. Multi chain in CUDA is possible now with PyTorch 1.1.0. Divergence info (which is NUTS tree diverging flag) can be added easily but it seems not important. Feel free to make a separate FR if it is necessary.
Hi @fehiepsi can you explain a little bit why you think the divergence diagnostics are not important? They seem to be critical for telling if NUTS is working properly. Do we have any other means to check convergence in Pyro? Right now the only thing I can find are the effective number of samples and R hat, but none of them are HMC specific.
@riversdark That's just my feeling. I haven't read much literature on divergence diagnostics. We can easily add it (we just need to decide where it should go: progress bar or store it). I'll open a FR for it.