The partially_pooled model in baseball.py seems to sporadically fail with the exception that the value argument is not within the support for the Pareto distribution. cc. @fehiepsi
Taking a look at this, but I am temporarily removing this from test_examples.py to unblock the build.
Traceback (most recent call last):
File "/home/travis/build/uber/pyro/pyro/poutine/trace_struct.py", line 128, in log_prob_sum
log_p = site["log_prob_sum"]
KeyError: 'log_prob_sum'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/travis/build/uber/pyro/examples/baseball.py", line 309, in <module>
main(args)
File "/home/travis/build/uber/pyro/examples/baseball.py", line 272, in main
.run(partially_pooled, at_bats, hits)
File "/home/travis/build/uber/pyro/pyro/infer/abstract_infer.py", line 84, in run
for tr, logit in self._traces(*args, **kwargs):
File "/home/travis/build/uber/pyro/pyro/infer/mcmc/mcmc.py", line 51, in _traces
for trace in self._gen_samples(self.num_samples, trace):
File "/home/travis/build/uber/pyro/pyro/infer/mcmc/mcmc.py", line 35, in _gen_samples
trace = self.kernel.sample(trace)
File "/home/travis/build/uber/pyro/pyro/infer/mcmc/nuts.py", line 276, in sample
direction, tree_depth, energy_current)
File "/home/travis/build/uber/pyro/pyro/infer/mcmc/nuts.py", line 159, in _build_tree
direction, tree_depth-1, energy_current)
File "/home/travis/build/uber/pyro/pyro/infer/mcmc/nuts.py", line 178, in _build_tree
direction, tree_depth-1, energy_current)
File "/home/travis/build/uber/pyro/pyro/infer/mcmc/nuts.py", line 178, in _build_tree
direction, tree_depth-1, energy_current)
File "/home/travis/build/uber/pyro/pyro/infer/mcmc/nuts.py", line 155, in _build_tree
return self._build_basetree(z, r, z_grads, log_slice, direction, energy_current)
File "/home/travis/build/uber/pyro/pyro/infer/mcmc/nuts.py", line 130, in _build_basetree
z, r, self._potential_energy, self._inverse_mass_matrix, step_size, z_grads=z_grads)
File "/home/travis/build/uber/pyro/pyro/ops/integrator.py", line 68, in single_step_velocity_verlet
z_grads, potential_energy = _potential_grad(potential_fn, z_next)
File "/home/travis/build/uber/pyro/pyro/ops/integrator.py", line 79, in _potential_grad
potential_energy = potential_fn(z)
File "/home/travis/build/uber/pyro/pyro/infer/mcmc/hmc.py", line 170, in _potential_energy
potential_energy = -self._compute_trace_log_prob(trace)
File "/home/travis/build/uber/pyro/pyro/infer/mcmc/hmc.py", line 154, in _compute_trace_log_prob
return self._trace_prob_evaluator.log_prob(model_trace)
File "/home/travis/build/uber/pyro/pyro/infer/mcmc/util.py", line 132, in log_prob
return model_trace.log_prob_sum()
File "/home/travis/build/uber/pyro/pyro/poutine/trace_struct.py", line 130, in log_prob_sum
log_p = site["fn"].log_prob(site["value"], *site["args"], **site["kwargs"])
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/torch/distributions/transformed_distribution.py", line 86, in log_prob
log_prob += _sum_rightmost(self.base_dist.log_prob(y),
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/torch/distributions/exponential.py", line 51, in log_prob
self._validate_sample(value)
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/torch/distributions/distribution.py", line 221, in _validate_sample
raise ValueError('The value argument must be within the support')
@neerajprad Which args did you use? I don't get any problem with partially_pooled so far but have faced some problems with partially_pooled_with_logit. These problems seem not related to recent pull requests though.
With seed=0, num-samples=200, warmup-steps=200, I observed that partially_pooled_with_logit model throws error at setup (specifically validate_trace). When scale is initialized with high value, alpha will take high positive or small negative values, which in turns make the probs parameter of Binomial very near 1 or 0. So log_prob will be -inf in many cases, which invalidates the init trace. Of course, probs ~ 1 or 0 is unrealistic and we should initialize scale with a more reasonable value. Maybe we reset the initial trace when it is invalid?
Maybe we reset the initial trace when it is invalid?
Yeah, we should probably resample the initial trace (a few times) in that case, except that we need to be careful because the model itself might be ill-specified in which case throwing an error would make sense.
I am seeing this problem using pytorch==0.4 on python 3.6, using the default args. I will confirm shortly if this is still an issue with PyTorch master.
@neerajprad I did use pytorch=0.4 too (after facing errors of distributions in python 1.0rc). I didn't catch
what is the reason for your failure yet but I found a bug in NUTS when using mass_matrix: we should modify _is_turning method too. It is no longer a dot product.
we should modify _is_turning method too
Oh good catch! We would need to multiply by the inverse of the mass matrix, I guess.
I am trying to debug why we are getting non deterministic output when we run HMC / NUTS with a fixed seed. I think that needs to be fixed first, even before we go on to debug why the model fails on certain initializations.
The reason for the non-determinism is torch.dot in self._kinetic_energy, which can give results that are slightly off on different runs. See snippet below - while expected and torch.dot might differ, it should differ consistently, which is not the case below. This is fixed on master, but for the time being I will revert to using the element wise product + sum operation to remove any non-determinism from our tests.
In [56]: for i in range(10):
...: m = torch.load('m.pt')
...: r = torch.load('r.pt')
...: print("torch.dot", m.dot(r**2).item())
...: print("expected", (m * (r**2)).sum().item())
...:
torch.dot 24.512012481689453
expected 24.51201057434082
torch.dot 24.51201057434082
expected 24.51201057434082
torch.dot 24.512012481689453
expected 24.51201057434082
torch.dot 24.512012481689453
expected 24.51201057434082
torch.dot 24.512012481689453
expected 24.51201057434082
torch.dot 24.512012481689453
expected 24.51201057434082
torch.dot 24.512012481689453
expected 24.51201057434082
torch.dot 24.51201057434082
expected 24.51201057434082
torch.dot 24.512012481689453
expected 24.51201057434082
torch.dot 24.512012481689453
expected 24.51201057434082
Is this the cause of our docs stage sporadically failing on CI (example)?
Is this the cause of our docs stage sporadically failing on CI (example)?
The non determinism in HMC should be fixed now. This sounds like a different issue. cc. @fehiepsi ^^. Maybe we should just fix the rng seed in that doctest. I'll do that shortly.
I think it is rng seed problem too.
I am keeping this issue open because while the non-determinism is fixed, the original failure of the model on certain initializations persists - i.e. getting invalid values that don't lie in the support of the Pareto distribution.
@neerajprad did you face this problem at the warmup phase? I tried different seed and num_samples, and got a different error when sample r_dist: "The parameter scale has invalid values". step_size is so small at that time (1e-34???)
Let me try to isolate the seed that triggers this on my system - it is easy to see this with parallel chains.
"The parameter scale has invalid values". step_size is so small at that time
Any ideas why the step size is going so low - maybe we should have some reasonable limit on the lowest allowable step size, or at least throw a warning if it goes too low during the adaptation phase?
Ah, the reason is at inf for initial energy. :D This should be solved when we revise setup method to create new trace when initial trace is invalid.
I checked Stan, and they tried to generate valid initial trace with max 100 times.
I checked Stan, and they tried to generate valid initial trace with max 100 times.
Good to know! Yeah, we should definitely fix this. Another thing that I was thinking was that in many of these cases (not the inf ones though), it might actually be faster to have a lower acceptance probability with a larger step size, rather than making the step size smaller and smaller to get to the 0.8 target acceptance probability. At the very least, we should offer this knob to the users.
Yes, I think we somehow should let users change hidden parameters. About small step_size, from what I learned from Stan reference and pymc3 notebook: smaller step_size the better. They even encourage increasing acceptance probability to a value ~1. When adapting, I think that with small step_size, we will get better mass_matrix. When mass_matrix is ready, step_size will automatically increase I guess (at least during the end_buffer).
To me, mass_matrix plays the role of setting different step_size for each parameter (with small variance params, we don't want to move far from a position; with large variance params, we don't want to stay around one position). So if we are able to learn it during warmup phase, the sampling will be good.
When adapting, I think that with small step_size, we will get better mass_matrix. When mass_matrix is ready, step_size will automatically increase I guess (at least during the end_buffer).
Yeah, that's a good observation! I think I was only seeing these issues without mass matrix adaptation, but you are right that the mass matrix will likely help alleviate many of these issues by adapting the step size per parameter, so we will probably not have to run many steps before getting a u turn.
@neerajprad It seems that we can close this?
Closing this, as I have not seen this in a while. Will reopen if this continues to be an issue.