Pyro: [feature request] manual mini-batching and batch dimension scaling

Created on 8 Oct 2018  路  6Comments  路  Source: pyro-ppl/pyro

In models with mixed levels of nesting (e.g. global_plate > local_plate_1 > local_plate_2 > ...), mibi-batching across different batch dimensions requires introducing proper scale factors for each batch dimension. Pyro handles these scale factors automatically if mini-batching is achieved via pyro.iarange(..., size=..., subsample_size=...) or pyro.iarange(..., size=..., subsample=...). The latter construct is flexible and allows arbitrary mibi-batching schemes, including big data situations where the full data tensor can not be loaded all at once.

Mini-batching, however, is often done _manually_ and _externally_ and not via pyro.iarange. In such cases, the appropriate scale factors must also be applied manually via poutine.scale. We are being consistent here: manual mini-batching? then manual scaling. However, most of the examples (DMM, VAE, ...) have little to no emphasis on this issue and neglect scaling altogether. While convergence is not a big deal while working with adaptive optimizers, neglecting the scale factors leads to wrong ELBO estimates.

  • [ ] Adding a word of caution to the examples about scale factors and/or throwing in poutine.scale when mini-batching manually to set a good precedent for the new users?
documentation help wanted

Most helpful comment

@neerajprad The use case you suggest cannot be accomplished via iarange(..., subsample_size=...), but it can be accomplished via iarange(..., subsample=...) (that is the motivating use case behind the subsample kwarg).

All 6 comments

Good points. Most of our examples don't do nested subsampling, but maybe we could add a manually-batched version of our LDA example or something similar? If you have another example where this is relevant, we'd definitely welcome a PR.

@mbabadi agreed we could improve docs about subsampling. I'm inclined to recommend users use pyro.iarange(..., subsample=...) when any minibatching is done, as that clarifies the intention of the code. Do you know of any cases where minibatching cannot be done through pyro.iarange(..., subsample=...)?

Do you know of any cases where minibatching cannot be done through pyro.iarange(..., subsample=...)?

I think one use case is when we are running inference on the GPU using a large dataset (i.e. calling data.cuda() at once will take a lot of GPU memory) for which the torch data loaders work great since they will spin up a thread of workers that will keep pulling off batches of data and transferring it to the GPU incrementally. We are using data loaders in our examples, but many of our datasets are probably small enough that they can be directly transferred in one shot.

@neerajprad The use case you suggest cannot be accomplished via iarange(..., subsample_size=...), but it can be accomplished via iarange(..., subsample=...) (that is the motivating use case behind the subsample kwarg).

The use case you suggest cannot be accomplished via iarange(..., subsample_size=...), but it can be accomplished via iarange(..., subsample=...) (that is the motivating use case behind the subsample kwarg).

Ahh, my bad. In that case, we should probably just change our examples to use subsample=, which will do the correct scaling.

@fritzo @neerajprad I also can not imagine what can _not_ be accomplished by iarange(..., subsample=...)! A callable subsampler can take care of both incremental data loading and optionally sending the minibatch to CUDA. That would be great if you could simply encourage the usage of this motif in the examples.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fritzo picture fritzo  路  4Comments

neerajprad picture neerajprad  路  4Comments

neerajprad picture neerajprad  路  4Comments

lundlab-kaltinel picture lundlab-kaltinel  路  3Comments

fehiepsi picture fehiepsi  路  3Comments