Hi,
In Seurat 2.1.0, t-SNE output changes if RunPCA() and RunTSNE() are executed more than once over an object. The output (t-SNE loadings in object@dr$tsne) differ between the first time both functions are run over an object and the subsequent runs of them over the same object, leading to a different t-SNE plots with TSNEPlot().
I've tested that behavior in different installations in different operating systems.
Is there a reason for that? I find that behavior baffling and I don't know which output should be trusted: the first time the functions are run or the following?
Thanks for your time.
For PCA, we're using the irlba package which has some implicit randomness. Turns out the function from that package we're using here doesn't automatically set a seed. This means that you can some times get solutions for PCA in which the signs are flipped for some PC values if you run it multiple times or it could potentially converge to slightly different answers. RunPCA has been updated to set a seed at the beginning to avoid this. You can either set a seed yourself before RunPCA or install from the develop branch (instructions here) where this has been fixed.
Hello Andrew,
Thank you for the fix. However, this created for me a problem because now my tSNE plot is completely different than the one I had with the master branch package. How can I revert the output to the original one, while keeping on using the develop branch version of Seurat?
Thanks in advance.
Best,
Leon
Hi Leon,
You can install from a specific commit on the develop branch using the ref parameter to install_github. The last commit before this change was made was c116e41.
Hello Andrew,
Thanks for your quick reply. However, this will not solve the issue for future releases. Can't you add set.seed() as an argument for RunPCA()? Also, is there a way to define the seed to be the same as the "seed" used by default by the irlba package? Even if it has some randomness, when run multiple times (not iteratively, but rather after reinitiating the Seurat object), it generates always the same output. Therefore, there must be some stable parameter that allows it to reproduce the same output each time it's run for the first time.
Best,
Leon
Hmm, we could add it as an argument but I'm not sure what the default seed should be in order to be exactly consistent with the past results. I ran a quick test (restarting the R session in between several times) and still get the sign flipping behavior, confirming that irlba isn't setting any seed.
library(irlba)
x = matrix(1:10000, 100, 100)
irlba(x, 2)$u[1:5, 1:2]
Just flipping the sign though shouldn't change the tSNE so I'm kind of surprised your plot changed completely. Open to suggestions if you've got any.
The behavior Leon is experiencing is exactly the same I tried to describe. Once RunPCA has been executed one time over an object, the rest of the times the result is different from the first output but always the same.
Hello Andrew,
I am also surprised that flipping the sign alters the tSNE output. I am currently running a test with multiple seed values that I'm setting before executing RunPCA(). What I see is that some seed values give exactly the same results (regarding the tSNE plots), while others result in unique plots. Yet none replicated my original result (although some seem to be qualitatively similar). The seed.use argument of RunTSNE() is set to 1, and the perplexity to 30.
I would rather agree on adding an argument corresponding to setting a seed in the RunPCA() function. If the argument is NULL (the default value), no seed is set. If not, then a seed is set corresponding to the value provided to the argument.
RunPCA <- function(..., seed.use = NULL) {
...
if (!is.null(seed.use)) {
set.seed(seed = seed.use)
}
...
}
I agree that this does not solve the issue. However, every time I ran my script after reinitiating the Seurat object, I always had the same result (that is, the first result, and not the results that emerge after iteratively re-executing RunPCA()) and, therefore, would like to keep that output unchanged. I am going to try to investigate further to see if I can find a better solution for this. I am also open to any suggestions you and others might have to solve this issue.
Best,
Leon
Hi again,
It seems that RunTSNE() is setting always the same seed. The values of .Random.seed after the execution of RunTSNE() are always the same. I tested changing the seed between executions via set.seed(). That's why after the first execution of RunTSNE() (not RunPCA()) the results become reproducible. So, it has nothing to do with irlba() setting a seed as you already pointed out.
@gresteban : The issue might be actually related to the irlba package because when ever I set a seed before executing RunPCA(), the tSNE output is always consistent, albeit different from the original and initial output without setting a seed. Therefore, the RunTSNE() function (and the underlying Rtsne package) is sensitive to (minor) changes in the output of irlba.
I like the option of having a seed argument as you described but we should probably have it set some seed by default (rather than NULL). That way people who need to reproduce previous tSNE plots would be able to do so by setting seed.use = NULL and we avoid the reproduciblility issue going forward.
Hello Andrew,
That works fine for me too. As long as I can avoid setting a seed, one way or another (i.e. be it by default or by passing it to the argument), it would be great. However, I think that the documentation should explicitly state that if NULL is passed to seed.use, no seed is set.
Thanks again for your help.
Best,
Leon
Hello Andrew,
That works fine for me too. As long as I can avoid setting a seed, one way or another (i.e. be it by default or by passing it to the argument), it would be great. However, I think that the documentation should explicitly state that if
NULLis passed toseed.use, no seed is set.Thanks again for your help.
Best,
Leon
Hi Andrew,
You can use [email protected]$RunTSNE to check the seed used in your previous calculation.
Best,
Lei
Hi Leon,
You can install from a specific commit on the develop branch using the
refparameter toinstall_github. The last commit before this change was made wasc116e41.
Hi Andrew,
I faced the same problem and try to install the old version of Seurat, when I type devtools::install_github(repo = 'satijalab/seurat', ref = 'c116e41') , it gives me the following error:
Error: HTTP error 422.
No commit found for SHA: c116e41
Rate limit remaining: 57/60
Rate limit reset at: 2019-02-22 20:07:06 UTC
It seems c116e41 doesn't exist any more?
Best,
Lei