I've created this issue to discuss design choices for a library for normalizing flows building on the existing code for inverse autoregressive flows with MADE.
Here's the first couple of things I wanted to work on:
I was going to start by adding extra capabilities to the MADE class in pyro.nn.AutoRegressiveNN so you have more control over its architecture. At the moment it has a single hidden layer, but you may wish to have multiple layers, skip connections (like in the original paper), change the nonlinearity...
Then I thought that pyro.distributions.iaf should take the autoregressive distribution it builds upons as an initialization parameter (with a sensible default) rather than have it fixed. This way you can create an IAF with an arbitrary MADE or, hopefully in the future, another autoregressive network such as WaveNet.
Also, it isn't too hard to add the inversion operation for IAF for when it hasn't been cached from sampling and I've got the code working for that.
I'll do a first pull request with just the flexible MADE implementation to make things more concrete. Is Pyro library code allowed to use NumPy?
Here's the pull request: #1262
I think my pull request is almost ready to be integrated. I have an improved IAF implementation ready plus MAF and NAF that I want to submit in my next ones.
With IAF, the current implementation uses Eq 14 from the paper, which the authors say is more numerically stable, rather than Eq 10. However, I've discovered that using the Eq 14 version doesn't seem to be able to fit simple toy datasets. I think it may have to do with the fact that it only scales each variable by a number in the range (0, 1). The TensorFlow implementation uses Eq 10, so I decided to go with that.
With MAF, it's just the inverse of IAF, so I was thinking of writing another type of Transform that can wrap the IAF class and invert the code, just like the Invert bijector in TensorFlow.
NAF appears to work much better than IAF and MAF in matching complex distributions. I've implemented the deep sigmoid flow (DSF) variant so far.
I was going to make some more changes to the MADE interface, but held off on that from my last pull request because it will break all the IAF code. I think it would be useful for the MADE to return a tuple of the parameters, rather than having to break up the output manually. So you could do for instance,
arn = AutoRegressiveNN(..., params=2)
mean, scale = arn(x)
for IAF, and for NAF,
arn = AutoRegressiveNN(..., params=3)
a, b, c = arn(x)
And I think the flows should take this network as a parameter so that they are decoupled.
@stefanwebb sounds good, looking forward to subsequent PRs
Here's my second pull request: #1311. When this one is ready, I can then submit my new normalizing flow implementations.
@fritzo @martinjankowiak how much latitude do I have for renaming classes that have already been committed?
For the 0.4 release I was thinking about the following improvement in the API:
transforms.InverseAutoregressiveFlow and transforms.InverseAutoregressiveFlowStable into transforms.AffineAutoregressive with stable=True/False keyword argument.transforms.PlanarFlow to transforms.Planar, transforms.RadialFlow to transforms.Radial, transforms.PolynomialFlow to transforms.Polynomial etc.transforms.RealNVP to transforms.AffineCouplingtransforms.DeepELUFlow, transforms.DeepSigmoidFlow, transforms.DeepLeakyReLUFlow into transforms.NeuralAutoregressive with keyword arg for choice of activation function (this would make the interface consistent with Block-NAF).transforms.BlockNAFFlow to transforms.BlockAutoregressivetransforms.BatchNormTransform to transforms.BatchNorm, transforms.PermuteTransform to transforms.Permute, etc.The reasoning for this is that a flow is technically a sequence of different transformations, not just a single one. E.g. IAF is a combination of autoregressive affine transformations and permutation layers. And RealNVP also includes batch norm transformations. The naming should suggest that the transformations are general "layers" that can be applied in any combination. And by renaming them this way, the names more closely resemble what operation the transform performs and their relationship to each other.
I think it also doesn't make sense to have multiple classes for what is essentially the same transformation with just another activation function.
@stefanwebb We'd like to do a 0.4 release soon, so as to support today's PyTorch 1.2 release. Do you think you'll finish this reorg in the next couple days, or can we punt it to a subsequent release?
Hi Fritz, could we please postpone it to the subsequent release?
No worries @stefanwebb, we can do this refactoring any time.
Hi @stefanwebb, probably sorry if it's off-topic, but I have a question on the usage of Inverse Autoregressive flow.
In particular, in order to speed up the things, I am trying to use IAFs to make do inference using Hamiltonian Monte Carlo for stochastic volatility model, however, I am not 100% sure it is being applied correctly. Here is a toy-example, do I understand correctly that this is how IAFs should be applied when defining the model:
without IAFs:
def model(tseries):
sigma = pyro.sample('sigma', dist.InverseGamma(2.5, 0.025))
y = pyro.sample('y', dist.Normal(0., sigma)), obs=tseries)
return y
with IAFs:
def model(tseries):
transform_sigma = dist.transforms.AffineAutoregressive(AutoRegressiveNN(1, [100]))
base_dist_sigma = dist.InverseGamma(torch.Tensor(np.repeat(2.5, 1)),
torch.Tensor(np.repeat(0.025, 1)))
flow_dist_sigma = dist.TransformedDistribution(base_dist_sigma, transform_sigma)
sigma = pyro.sample("sigma", flow_dist_sigma)
y = pyro.sample('y', dist.Normal(0., sigma)), obs=tseries)
return y
will be very grateful for your assistance.
Hi @ameshkovskiy, fantastic that you're interested in Normalizing Flows!
I'd be glad to answer your question but I think it would be good to post over in the forum: https://forum.pyro.ai/
Would you be able to kindly repost your question there, then I'll pop over and leave a reply?
Most helpful comment
Here's the pull request: #1262