Currently, HMC and autoguide do not give inference on parameters from pyro.param (autoguide only gives MAP point for these params). Should we support it or require users set weak priors such as Normal(0, 1000) to get posteriors for these parameters?
I don't think we should be doing that automatically. Users can always use lift if they want to put priors on their parameters.
So you mean that users should set weak priors (through lift or directly from their model) if they want loc and scale from AutoNormal (or samples from HMC) contain posterior information about their params (with flat priors)?
So you mean that users should set weak priors (through lift or directly from their model) if they want loc and scale from AutoNormal (or samples from HMC) contain posterior information about their params?
Yes. We could consider adding helpers that automatically generate weak prior Distributions for use with lift given a model, but I generally think it's better to be explicit about these things.
I somehow agree with you. But every time I use HMC for models with pyro.param statements, I just wish that we can get samples for them too. ^^! I leave this discussion for a while to see if there are other opinions. Then I'll close this. :)
What's wrong with using lift? You can even write a custom ImproperUniform Distribution (whose log_prob always returns 0) and generator with lift, similar to the way we create autoguides from traces:
class ImproperUniform(Delta):
def log_prob(self, value):
return value.new_tensor(0.)
def improper_priors_from_trace(trace):
priors = {}
for name, node in trace.nodes.items():
if node["type"] == "param":
tfm = get_transform(name, msg["value"]) # handle constraints
priors[name] = TransformedDistribution(ImproperUniform(msg["value"].unconstrained().clone()), tfm)
return priors
priors = improper_priors_from_trace(trace(model).get_trace(...))
improper_model = lift(model, priors)
Personally, I'd rather not push even more complexity inside of HMC initialization by putting all that behind a new keyword argument, and I'm not a fan of using improper priors, but maybe it's a worthwhile tradeoff. @neerajprad WDYT?
I am not sure how to interpret param statements for MCMC. If we would like to sample from them, why not just make them sample statements and specify a weak prior. I find the default behavior of treating them as constants least confusing (even better to just use tensors to specify the prior). As already noted, lift provides the users the flexibility to convert param statements in an existing model to sample statements, if they need to.
~I see merit in making the behavior consistent between autoguides and MCMC~ (EDIT: I don't see how we can do this for MCMC, so the current behavior seems reasonable) which I believe would provide ML estimate for params (from my understanding, correct me if I am mistaken). I also think that it is the least confusing behavior for autoguides in that param statements behave identical to how they would behave for a model in SVI. That said, my experience with different modeling scenarios is limited. @fehiepsi - do you have any particular use cases where you have found the current behavior lacking?
Thinking about it some more, I suppose it can be argued that sampling from the improper prior is the behavior most consistent with autoguides. Maybe we can make this the default, i.e. use @eb8680's snippet above, to transform all param statements in the model. If the user would like to preserve the current behavior, they can remove param statements from the model, and if they would like to specify their own prior they can use a lifted model with their own specified prior distributions instead.
I don't think we should make improper priors the default in MCMC, because improper priors aren't consistent with Pyro's default generative semantics. I also don't see how it's consistent with autoguide behavior - SVI optimizes parameters, it doesn't maintain a posterior approximation over them. We can provide the helpers above as a single model wrapper lift_improper or something, and users can always explicitly wrap their models before applying MCMC.
do you have any particular use cases where you have found the current behavior lacking?
I have bunch of models with flat priors (in Chapter 6 of the book Statistical Rethinking). However, I can write wrappers for my purpose without any difficulty. I just want to open a discussion to see if people wants the proposed behavior default. :)
@eb8680 @neerajprad Thanks for giving your opinions! I agree that we should not make it default. It can simplify small models but for large model with bunch of params (e.g. CNN + GP), getting posteriors for these params will waste a lot of resource. And in case we need, we can lift.
Most helpful comment
I have bunch of models with flat priors (in Chapter 6 of the book Statistical Rethinking). However, I can write wrappers for my purpose without any difficulty. I just want to open a discussion to see if people wants the proposed behavior default. :)
@eb8680 @neerajprad Thanks for giving your opinions! I agree that we should not make it default. It can simplify small models but for large model with bunch of params (e.g. CNN + GP), getting posteriors for these params will waste a lot of resource. And in case we need, we can lift.