The following tests (list may increase) fail on PyTorch master, which seem unrelated to tensor casting issue in #944. I encountered these while fixing the tensor casting issues in #952. Easy way to replicate this would be to run bash scripts/install_pytorch.sh with commit 2df578a --> 1807bac in README.md*. I might skip these for the time being.
bayesian_regression.py - RuntimeError: you can only change requires_grad flags of leaf variables. cc. @jpchen ADVIMultivariateNormal in test_advi.py - advi covariance off. cc. @fritzogp/test_models.py - test_inference, test_inference_svgp, test_inference_deepGP. Lapack Error in potrf : the leading minor of order 2 is not positive definite, and assertion failure on variance. cc. @fehiepsi gp/test_likelihoods - same as above.gp.ipynb - Non-deterministic travis failures due to constraint checking in multivariate normal. See discussion below.EDIT - After removing the mvn wrapper, all but bayesian_regression.py and gp.ipynb need to be fixed.
* Separately I will modify the script to take in an optional arg for build commit.
I think these lines trigger the "requires_grad flags" issue:
The issue is that w_mu is that the type_as comes after the requires_grad assignment, so w_mu is no longer a leaf variable. The existing code is equivalent to:
w_mu_hidden = torch.randn(1, p, requires_grad=True) # leaf variable (optimizable)
w_mu = w_mu_hidden.type_as(data) # backprops to of w_mu_hidden via backward type conversion
A fix is:
w_mu = torch.randn(1, p).type_as(data)
w_mu.requires_grad = True
(We need a better way to avoid or detect these sorts of issues in PyTorch)
I've been seeing spurrious failures in gp.ipynb. Are the random seeds being fixed? @fehiepsi any ideas? https://travis-ci.org/uber/pyro/jobs/360155682
I've been seeing spurrious failures in gp.ipynb. Are the random seeds being fixed? @fehiepsi any ideas? https://travis-ci.org/uber/pyro/jobs/360155682
I restarted the run, but I noticed it too. The issue is that the constraint checking complains that the precision matrix has invalid values, but the constructor was called without the precision matrix in the first place. I think it may have to do with the precision_matrix being lazy loading. The other thing is that validate_args is reversing all the gains by actually eagerly initializing all of these properties.
The issue is that w_mu is that the type_as comes after the requires_grad assignment, so w_mu is no longer a leaf variable. The existing code is equivalent to:
Thanks for explaining, @colesbury. That makes it super clear what's happening! cc. @jpchen.
@fritzo - #956 temporarily disables smoke test on the gp tutorial, until we investigate and fix that upstream.
@fritzo @neerajprad I run the notebook locally and was able to reproduce the issue. That covariance matrix is 100x100 dimensional and positive definite (after checking with torch.symeig) but its inverse (precision matrix) is not positive definite. Its smallest eigenvalue is about -0.2, all other eigenvalues are positive. I made a notebook to observe the behavior here: https://gist.github.com/fehiepsi/e33d8a0ffbb52ca3a8f7d589c9dcb9ab
I don't know if it is better to use pre version instead of PRE. @SsnL what do you think about calculating the lazy precision_matrix based on scale_tril instead of covariance_matrix?
Edit: I think that it is better to use scale_tril version. In future, if we have grad for potri, it is also easier to change. So I made a pull request in pytorch.
Most helpful comment
@fritzo @neerajprad I run the notebook locally and was able to reproduce the issue. That covariance matrix is 100x100 dimensional and positive definite (after checking with
torch.symeig) but its inverse (precision matrix) is not positive definite. Its smallest eigenvalue is about -0.2, all other eigenvalues are positive. I made a notebook to observe the behavior here: https://gist.github.com/fehiepsi/e33d8a0ffbb52ca3a8f7d589c9dcb9abI don't know if it is better to use
preversion instead ofPRE. @SsnL what do you think about calculating the lazy precision_matrix based on scale_tril instead of covariance_matrix?Edit: I think that it is better to use scale_tril version. In future, if we have grad for
potri, it is also easier to change. So I made a pull request in pytorch.