Pyro: [feature request] manual mini-batching and batch dimension scaling

Created on 8 Oct 2018 · 6Comments · Source: pyro-ppl/pyro

In models with mixed levels of nesting (e.g. global_plate > local_plate_1 > local_plate_2 > ...), mibi-batching across different batch dimensions requires introducing proper scale factors for each batch dimension. Pyro handles these scale factors automatically if mini-batching is achieved via pyro.iarange(..., size=..., subsample_size=...) or pyro.iarange(..., size=..., subsample=...). The latter construct is flexible and allows arbitrary mibi-batching schemes, including big data situations where the full data tensor can not be loaded all at once.

Mini-batching, however, is often done _manually_ and _externally_ and not via pyro.iarange. In such cases, the appropriate scale factors must also be applied manually via poutine.scale. We are being consistent here: manual mini-batching? then manual scaling. However, most of the examples (DMM, VAE, ...) have little to no emphasis on this issue and neglect scaling altogether. While convergence is not a big deal while working with adaptive optimizers, neglecting the scale factors leads to wrong ELBO estimates.

[ ] Adding a word of caution to the examples about scale factors and/or throwing in poutine.scale when mini-batching manually to set a good precedent for the new users?

documentation help wanted

Source

mbabadi

Most helpful comment

@neerajprad The use case you suggest cannot be accomplished via iarange(..., subsample_size=...), but it can be accomplished via iarange(..., subsample=...) (that is the motivating use case behind the subsample kwarg).

fritzo on 16 Oct 2018

👍2

All 6 comments

Good points. Most of our examples don't do nested subsampling, but maybe we could add a manually-batched version of our LDA example or something similar? If you have another example where this is relevant, we'd definitely welcome a PR.

eb8680 on 12 Oct 2018

@mbabadi agreed we could improve docs about subsampling. I'm inclined to recommend users use pyro.iarange(..., subsample=...) when any minibatching is done, as that clarifies the intention of the code. Do you know of any cases where minibatching cannot be done through pyro.iarange(..., subsample=...)?

fritzo on 16 Oct 2018

Do you know of any cases where minibatching cannot be done through pyro.iarange(..., subsample=...)?

I think one use case is when we are running inference on the GPU using a large dataset (i.e. calling data.cuda() at once will take a lot of GPU memory) for which the torch data loaders work great since they will spin up a thread of workers that will keep pulling off batches of data and transferring it to the GPU incrementally. We are using data loaders in our examples, but many of our datasets are probably small enough that they can be directly transferred in one shot.

neerajprad on 16 Oct 2018

😕1

fritzo on 16 Oct 2018

👍2

The use case you suggest cannot be accomplished via iarange(..., subsample_size=...), but it can be accomplished via iarange(..., subsample=...) (that is the motivating use case behind the subsample kwarg).

Ahh, my bad. In that case, we should probably just change our examples to use subsample=, which will do the correct scaling.

neerajprad on 16 Oct 2018

@fritzo @neerajprad I also can not imagine what can _not_ be accomplished by iarange(..., subsample=...)! A callable subsampler can take care of both incremental data loading and optionally sending the minibatch to CUDA. That would be great if you could simply encourage the usage of this motif in the examples.

mbabadi on 17 Oct 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Autoguide inference is incorrect for VonMises distribution

fritzo · 4Comments

Perf bug in multivariate normal due to inefficient .expand

neerajprad · 4Comments

Getting the posterior predictive samples for a model

neerajprad · 4Comments

Cannot run the example code due to unknown parameter

lundlab-kaltinel · 3Comments

JIT for differentiable_loss

fehiepsi · 3Comments