@aseyboldt Thanks for the hard work in #2028, sampling is fine now. However, the random method in both MvNormal and gp is not working too well. For example, even running the simple GP notebook, gp.sample_gp returns only NaNs. I think it might be a safer option to use the scipy Multivariance to generate random number, as even if the covariance is not positive-semidefinite it still return random samples but with warning.
Ahh, just when I thought that was finally finished... I'll look into it tomorrow. Thanks for the report.
Seems there are three issues here:
@aseyboldt
We return nans if the covariance is not positive definite. This one was actually intentional, but it might have been a bad idea. This should be easy to fix, too. We just need to use the mvnormal sampling function from scipy, which allows semidefinite covariances.
I think this is a good idea during sampling but during random we should allow non definite cov matrix (same treatment as scipy). I think in the base level we should always have cov around in MvNormal, but computation is by default using chol.
The covariances that are created when doing posterior predictive sampling are not positive definite. All but ~5 eigenvalues (100 in total) are zero (edit if I sample longer, a couple more eigenvalues are >0). I have no clue why this might be the case. @junpenglao @fonnesbeck any idea? Doesn't this mean that our posterior predictive distribution of the eigenspectrum is just completely wrong?
I will have a look at the solutions in GPflow and Gpy
FYI, for posterior sampling:
GPflow add jitter to cov before Cholesky:
https://github.com/GPflow/GPflow/blob/master/GPflow/model.py#L384-L397
GPy used numpy.random.multivariate_normal
https://github.com/SheffieldML/GPy/blob/deploy/GPy/core/gp.py#L496-L500
@aseyboldt seems it is nothing wrong that the posterior covariance is not positive definite. Intuitively, when two observed points are very close to each other, the correspondent column/row in the cov matrix is nearly linear dependent (after all, they are almost identical), which breaks the Cholesky factorization.
@junpenglao I expected small eigenvalues, just not that small. But you are right, I just checked a couple of the covariance matrices from GP-covariances.ipynb. In many cases most eigenvalues are almost zero. I guess it makes sense, if you have a slowly changing gp and evaluate it on a fine grid you really only need a few values to specify what the whole curve looks like. Shouldn't this make it really hard for NUTS to sample from Gaussian processes though?
I'll use the version in scipy, it computes all eigenvalues and sets very small ones – even if they are negative – to zero. It's slower, but it works for semidefinite matrices.
I think adding a small diagonal is the way to go here. The change to results is negligible, and as @junpenglao saw its how other gp libraries deal with this issue. @aseyboldt it does make it difficult for nuts to sample from for that reason.
GPflow reparameterizes the problem by having the random variable be v ~ Normal, and rotating it, Cov ~ chol_factor * v.
@bwengals Adding a something to the diagonal seems fine for gp, but I'd rather not do this in MvNormal.random. We can use the scipy function there, speed shouldn't be that much of an issue for the posterior predictive samples, right? In the long run we should probably switch to a block ldlt decomposition (should be pretty stable for indefinite matrices), but this is only in a recent lapack and scipy doesn't expose it yet. There's a PR though, and maybe this will end up in 1.0.
Oh sorry, I did mean a diagonal for sample_gp, not MvNormal. Are there other situations besides gps where this is coming up? I agree with your verdict though, using scipy for the random method would be a bit slower but probably not a big deal for posterior predictive samples. The logp and dlogp are where speed is critical. I think I read somewhere that stan uses the block ldlt decomp (I think you might have posted that)?
@bwengals Can you check whether the problem is fixed now?
Checked on my side, works super.
works for me too!
Hey guys, sorry to re-open this thread, but I've been noticing that my GP samples are also all returning NaN values. Is there something I can do about this or should I continue waiting for other PRs to come in?
In case you want an example of this, I have it up here. The GP stuff is towards the bottom of the notebook. (I'm doing some crazy experimental stuff on this, but if you have input on the science, I'd love to hear!!)
Pardon my "noobness" on this, still picking things up and learning through examples.
I'll have a look tomorrow.
@aseyboldt thanks a ton! If it helps, my environment is setup as such:
They all reside together in a separate conda environment.
Hi @twiecki. gp.sample_gp still returns a lot of nan. But not all values are nan.
I have pymc3 of version 3.1.
Could you post an example that fails? Would be happy to take a look
Also pip version of the package doesn't have chol_const in gp.sample_gp.
yup, it looks like that bug was fixed 4 days ago in #2373. I'd strongly recommend keeping it set to True. Do you get nans when chol_const = True?
@bwengals Yeah, It worked. All I needed to do is to reinstall pymc3 by cloning the repository.