Hi,
I have an issue running the jackStraw method:
> seurat.data = jackStraw(seurat.data, num.replicate=200, do.print=FALSE)
Error in matrix(x2[ind], nrow = nrow(x), ncol = ncol(x), byrow = TRUE) :
non-numeric matrix extent
I saved the seurat.data object and stuck it up here: https://dl.dropboxusercontent.com/u/2822886/seurat-error.RData
Any ideas about why this is failing? It has worked on other datasets, I'm not sure what is different about this one.
Could you please share these 3 components from your huge Rdata? They are all I need to figure out what is going on.
pca.x <- [email protected]
save(pca.x, file = <path>)
pca.rot <- [email protected]
save(pca.rot, file = <path>)
sca.dt <- [email protected]
save(sca.dt, file = <path>)
Hi Yun,
Thanks for responding. Here is an RData object with just those three objects saved. It is still large (170 MB). https://dl.dropboxusercontent.com/u/2822886/seurat-small.RData
Thanks for sharing the file.
Solution
If prop.freq = 0.016 (or any other number greater than 0.015), your call runs.
Why error
The error is because you have not set the prop.freq parameter.It is used to determine the number of permuted 'synthetic' null original variables denoted as s in original jackstraw paper. In Seurat implementation, s = (number of variable genes) * prop.freq.
Going deep
Your specific data yields extreme case for Seurat which is helpful for the package grow-up. Thanks again for reporting issue.
By default, prop.freq will be 0.01. If your data has less than 200 variable genes, the prop.freq will be forced to be up a little bit, to 0.015, so that helps user has enough tests to get robust results. Your data is the latter case which has 131 variable genes, therefore 131 * 0.015 = 1.965 ~ 1 gene is ready to be permuted. In this extreme case the accuracy of p-value is going to be maximized while the algorithm is least efficient, referring to original jackstraw paper. *and jackRandom will run PCA on this single gene. It is not making sense to run PCA if only one data point exists, thus current Seurat implementation does not allow this happen but without warning, thus the error from R happens.
I recently have implemented a patch for jackStraw function which ensure at least 2 genes are permuted if users do not set the prop.freq parameter, see here: https://github.com/Puriney/honfleuR
Does anyone also get this error running jackStrawPlot?
> jackStrawPlot(srt, PCs = 1:12)
Error: Unknown parameters: dist
Seems my error comes from line 2706 of seurat.R:
stat_qq(dist=qunif) should be stat_qq(distribution=qunif)
I get this error as well. I believe this is because the latest version of ggplot2 does not allow for the dist parameter. If you want a quick fix, rolling back to ggplot 2.0.0 fixes this, but it would be nice to update the seurat code as well.
Opened up a pull request for the fix.
library(devtools)
install_github("roryk/seurat")
will install the fixed version so you don't have to roll back ggplot.
Most helpful comment
will install the fixed version so you don't have to roll back ggplot.