Pyro: Tests failing on PyTorch master

Created on 29 Mar 2018  路  6Comments  路  Source: pyro-ppl/pyro

The following tests (list may increase) fail on PyTorch master, which seem unrelated to tensor casting issue in #944. I encountered these while fixing the tensor casting issues in #952. Easy way to replicate this would be to run bash scripts/install_pytorch.sh with commit 2df578a --> 1807bac in README.md*. I might skip these for the time being.

  • [x] bayesian_regression.py - RuntimeError: you can only change requires_grad flags of leaf variables. cc. @jpchen
  • [x] Tests using ADVIMultivariateNormal in test_advi.py - advi covariance off. cc. @fritzo
  • [x] Tests in gp/test_models.py - test_inference, test_inference_svgp, test_inference_deepGP. Lapack Error in potrf : the leading minor of order 2 is not positive definite, and assertion failure on variance. cc. @fehiepsi
  • [x] Tests in gp/test_likelihoods - same as above.
  • [x] gp.ipynb - Non-deterministic travis failures due to constraint checking in multivariate normal. See discussion below.

EDIT - After removing the mvn wrapper, all but bayesian_regression.py and gp.ipynb need to be fixed.

* Separately I will modify the script to take in an optional arg for build commit.

bug

Most helpful comment

@fritzo @neerajprad I run the notebook locally and was able to reproduce the issue. That covariance matrix is 100x100 dimensional and positive definite (after checking with torch.symeig) but its inverse (precision matrix) is not positive definite. Its smallest eigenvalue is about -0.2, all other eigenvalues are positive. I made a notebook to observe the behavior here: https://gist.github.com/fehiepsi/e33d8a0ffbb52ca3a8f7d589c9dcb9ab

I don't know if it is better to use pre version instead of PRE. @SsnL what do you think about calculating the lazy precision_matrix based on scale_tril instead of covariance_matrix?

Edit: I think that it is better to use scale_tril version. In future, if we have grad for potri, it is also easier to change. So I made a pull request in pytorch.

All 6 comments

I think these lines trigger the "requires_grad flags" issue:

https://github.com/uber/pyro/blob/de50a8ee5854c4f00f37cbd851b536b8fbfe2151/examples/bayesian_regression.py#L76

https://github.com/uber/pyro/blob/de50a8ee5854c4f00f37cbd851b536b8fbfe2151/examples/bayesian_regression.py#L79

The issue is that w_mu is that the type_as comes after the requires_grad assignment, so w_mu is no longer a leaf variable. The existing code is equivalent to:

w_mu_hidden = torch.randn(1, p, requires_grad=True)  # leaf variable (optimizable)
w_mu = w_mu_hidden.type_as(data)  # backprops to of w_mu_hidden via backward type conversion

A fix is:

w_mu = torch.randn(1, p).type_as(data)
w_mu.requires_grad = True

(We need a better way to avoid or detect these sorts of issues in PyTorch)

I've been seeing spurrious failures in gp.ipynb. Are the random seeds being fixed? @fehiepsi any ideas? https://travis-ci.org/uber/pyro/jobs/360155682

I've been seeing spurrious failures in gp.ipynb. Are the random seeds being fixed? @fehiepsi any ideas? https://travis-ci.org/uber/pyro/jobs/360155682

I restarted the run, but I noticed it too. The issue is that the constraint checking complains that the precision matrix has invalid values, but the constructor was called without the precision matrix in the first place. I think it may have to do with the precision_matrix being lazy loading. The other thing is that validate_args is reversing all the gains by actually eagerly initializing all of these properties.

The issue is that w_mu is that the type_as comes after the requires_grad assignment, so w_mu is no longer a leaf variable. The existing code is equivalent to:

Thanks for explaining, @colesbury. That makes it super clear what's happening! cc. @jpchen.

@fritzo - #956 temporarily disables smoke test on the gp tutorial, until we investigate and fix that upstream.

@fritzo @neerajprad I run the notebook locally and was able to reproduce the issue. That covariance matrix is 100x100 dimensional and positive definite (after checking with torch.symeig) but its inverse (precision matrix) is not positive definite. Its smallest eigenvalue is about -0.2, all other eigenvalues are positive. I made a notebook to observe the behavior here: https://gist.github.com/fehiepsi/e33d8a0ffbb52ca3a8f7d589c9dcb9ab

I don't know if it is better to use pre version instead of PRE. @SsnL what do you think about calculating the lazy precision_matrix based on scale_tril instead of covariance_matrix?

Edit: I think that it is better to use scale_tril version. In future, if we have grad for potri, it is also easier to change. So I made a pull request in pytorch.

Was this page helpful?
0 / 5 - 0 ratings