Pymc3: pm.gp.sample_gp returns nan

Created on 5 May 2017 · 20Comments · Source: pymc-devs/pymc3

@aseyboldt Thanks for the hard work in #2028, sampling is fine now. However, the random method in both MvNormal and gp is not working too well. For example, even running the simple GP notebook, gp.sample_gp returns only NaNs. I think it might be a safer option to use the scipy Multivariance to generate random number, as even if the covariance is not positive-semidefinite it still return random samples but with warning.

Source

junpenglao

All 20 comments

Ahh, just when I thought that was finally finished... I'll look into it tomorrow. Thanks for the report.

aseyboldt on 5 May 2017

👍1

Seems there are three issues here:

The shape of the output is wrong if the matrix is not spd. This is trivial to fix.
We return nans if the covariance is not positive definite. This one was actually intentional, but it might have been a bad idea. This should be easy to fix, too. We just need to use the mvnormal sampling function from scipy, which allows semidefinite covariances.
The covariances that are created when doing posterior predictive sampling are not positive definite. All but ~5 eigenvalues (100 in total) are zero (edit if I sample longer, a couple more eigenvalues are >0). I have no clue why this might be the case. @junpenglao @fonnesbeck any idea? Doesn't this mean that our posterior predictive distribution of the eigenspectrum is just completely wrong?

aseyboldt on 6 May 2017

@aseyboldt

We return nans if the covariance is not positive definite. This one was actually intentional, but it might have been a bad idea. This should be easy to fix, too. We just need to use the mvnormal sampling function from scipy, which allows semidefinite covariances.

I think this is a good idea during sampling but during random we should allow non definite cov matrix (same treatment as scipy). I think in the base level we should always have cov around in MvNormal, but computation is by default using chol.

The covariances that are created when doing posterior predictive sampling are not positive definite. All but ~5 eigenvalues (100 in total) are zero (edit if I sample longer, a couple more eigenvalues are >0). I have no clue why this might be the case. @junpenglao @fonnesbeck any idea? Doesn't this mean that our posterior predictive distribution of the eigenspectrum is just completely wrong?

I will have a look at the solutions in GPflow and Gpy

junpenglao on 6 May 2017

FYI, for posterior sampling:
GPflow add jitter to cov before Cholesky:
https://github.com/GPflow/GPflow/blob/master/GPflow/model.py#L384-L397

GPy used numpy.random.multivariate_normal
https://github.com/SheffieldML/GPy/blob/deploy/GPy/core/gp.py#L496-L500

junpenglao on 6 May 2017

@aseyboldt seems it is nothing wrong that the posterior covariance is not positive definite. Intuitively, when two observed points are very close to each other, the correspondent column/row in the cov matrix is nearly linear dependent (after all, they are almost identical), which breaks the Cholesky factorization.

junpenglao on 6 May 2017

@junpenglao I expected small eigenvalues, just not that small. But you are right, I just checked a couple of the covariance matrices from GP-covariances.ipynb. In many cases most eigenvalues are almost zero. I guess it makes sense, if you have a slowly changing gp and evaluate it on a fine grid you really only need a few values to specify what the whole curve looks like. Shouldn't this make it really hard for NUTS to sample from Gaussian processes though?
I'll use the version in scipy, it computes all eigenvalues and sets very small ones – even if they are negative – to zero. It's slower, but it works for semidefinite matrices.

aseyboldt on 6 May 2017

I think adding a small diagonal is the way to go here. The change to results is negligible, and as @junpenglao saw its how other gp libraries deal with this issue. @aseyboldt it does make it difficult for nuts to sample from for that reason.
GPflow reparameterizes the problem by having the random variable be v ~ Normal, and rotating it, Cov ~ chol_factor * v.

bwengals on 6 May 2017

@bwengals Adding a something to the diagonal seems fine for gp, but I'd rather not do this in MvNormal.random. We can use the scipy function there, speed shouldn't be that much of an issue for the posterior predictive samples, right? In the long run we should probably switch to a block ldlt decomposition (should be pretty stable for indefinite matrices), but this is only in a recent lapack and scipy doesn't expose it yet. There's a PR though, and maybe this will end up in 1.0.

aseyboldt on 7 May 2017

Oh sorry, I did mean a diagonal for sample_gp, not MvNormal. Are there other situations besides gps where this is coming up? I agree with your verdict though, using scipy for the random method would be a bit slower but probably not a big deal for posterior predictive samples. The logp and dlogp are where speed is critical. I think I read somewhere that stan uses the block ldlt decomp (I think you might have posted that)?

bwengals on 7 May 2017

@bwengals Can you check whether the problem is fixed now?

twiecki on 8 May 2017

Checked on my side, works super.

junpenglao on 8 May 2017

👍1

works for me too!

bwengals on 9 May 2017

Hey guys, sorry to re-open this thread, but I've been noticing that my GP samples are also all returning NaN values. Is there something I can do about this or should I continue waiting for other PRs to come in?

In case you want an example of this, I have it up here. The GP stuff is towards the bottom of the notebook. (I'm doing some crazy experimental stuff on this, but if you have input on the science, I'd love to hear!!)

Pardon my "noobness" on this, still picking things up and learning through examples.

ericmjl on 2 Jun 2017

I'll have a look tomorrow.

aseyboldt on 2 Jun 2017

❤1

@aseyboldt thanks a ton! If it helps, my environment is setup as such:

pymc3 installed from master
theano 0.9.0 installed via conda
scipy 0.19 installed via conda
numpy 1.12.1 installed via conda

They all reside together in a separate conda environment.

ericmjl on 2 Jun 2017

Hi @twiecki. gp.sample_gp still returns a lot of nan. But not all values are nan.
I have pymc3 of version 3.1.

TropComplique on 5 Jul 2017

Could you post an example that fails? Would be happy to take a look

bwengals on 5 Jul 2017

Also pip version of the package doesn't have chol_const in gp.sample_gp.

TropComplique on 5 Jul 2017

yup, it looks like that bug was fixed 4 days ago in #2373. I'd strongly recommend keeping it set to True. Do you get nans when chol_const = True?

bwengals on 5 Jul 2017

@bwengals Yeah, It worked. All I needed to do is to reinstall pymc3 by cloning the repository.

TropComplique on 5 Jul 2017

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Memory Error: OrderedLogistic

aloctavodia · 19Comments

Multiprocessing fails when sampling multiple chains using multiple cores

JackCaster · 27Comments

Move SFGS, CSG and Elliptical Slice samplers to pymc3-experiments

twiecki · 22Comments

Release 3.9.3

twiecki · 23Comments

Vectors of multivariate variables

fonnesbeck · 88Comments