Seurat: re_clustering in seurat v3

Created on 13 May 2019 · 8Comments · Source: satijalab/seurat

Dear Seurat team,
Thanks for the last version of Seurat, I started using Seurat v3 two weeks ago and I'm having some problems with the subsetting and reclustering. . For the first clustering, that works pretty well, I'm using the tutorial of "Integrating stimulated vs. control PBMC datasets" to integrate 11 sample. But my problem start with the subsetting and reclustering, I don't know how to properly calculate the genes to be use in the reclustering.

I'm using three approaches, but I'm not sure which one is the right one.

*Approach 1 Processing as a single sample. this approach works, but I'm not sure if is right.

Subset_Cells <- SubsetData(Sample_integrated, subset.name = "seurat_clusters", accept.value = c(2, 5, 8, 9, 10, 11, 14, 15, 16))
Subset_Cells <- ScaleData(Subset_Cells) # I always re-scale after subsetting.
Subset_Cells <- FindVariableFeatures(Subset_Cells, selection.method = "vst", nfeatures = 2000)
Warning messages:
1: In FindVariableFeatures.Assay(object = assay.data, selection.method = selection.method, :
selection.method set to 'vst' but count slot is empty; will use data slot instead
2: In eval(predvars, data, env) : NaNs produced
3: In hvf.info$variance.expected[not.const] <- 10^fit$fitted :
number of items to replace is not a multiple of replacement length

all.genes <- rownames(Subset_Cells)
Subset_Cells <- ScaleData(Subset_Cells, features = all.genes, vars.to.regress = c("nCount_RNA", "percent.mt", "S.Score", "G2M.Score"))
Subset_Cells <- RunPCA(Subset_Cells, features = VariableFeatures(object = Subset_Cells))
Subset_Cells <- JackStraw(Subset_Cells, num.replicate = 100)
Subset_Cells <- ScoreJackStraw(Subset_Cells, dims = 1:20)
Subset_Cells <- FindNeighbors(Subset_Cells, dims = 1:10)
Subset_Cells <- FindClusters(Subset_Cells, resolution = 0.5)
Subset_Cells <- RunTSNE(Subset_Cells, dims = 1:10)

Approach 2 this approach works, but I'm not sure if is right.

Subset_Cells <- SubsetData(Sample_integrated, subset.name = "seurat_clusters", accept.value = c(2, 5, 8, 9, 10, 11, 14, 15, 16))
Subset_Cells.list <- SplitObject(Subset_Cells, split.by = "sample")

for (i in 1:length(Subset_Cells.list)) {
Subset_Cells.list[[i]] <- NormalizeData(Subset_Cells.list[[i]], verbose = FALSE)
Subset_Cells.list[[i]] <- FindVariableFeatures(Subset_Cells.list[[i]], selection.method = "vst", nfeatures = 2000,
verbose = FALSE)
}

After run that scrip, I have the following error, but the code continue running.
*(Error: Cannot add a different number of cells than already present)*

reference.list <- Subset_Cells.list[c("Sample1", "Sample2", "Sample3", "Sample4", Sample5"]
Integrated <- FindIntegrationAnchors(object.list = reference.list, dims = 1:20)
Integrated <- IntegrateData(anchorset = Integrated, dims = 1:20)
Integrated <- ScaleData(Integrated, verbose = FALSE, vars.to.regress = c("nCount_RNA", "percent.mt", "S.Score", "G2M.Score"))
Integrated <- RunPCA(Integrated, npcs = 30, verbose = FALSE)
Integrated <- RunTSNE(Integrated, reduction = "pca", dims = 1:20)
Integrated <- FindNeighbors(Integrated, reduction = "pca", dims = 1:20)
Integrated <- FindClusters(Integrated, resolution = 0.6)
This script works, but I'm not sure if is right.

Approach 3 (Subset by sample) This script doesn't work.

Subset_Cells <- SubsetData(Sample_integrated, subset.name = "seurat_clusters", accept.value = c(2, 5, 8, 9, 10, 11, 14, 15, 16))

subset samples and recalculate variable genes by sample.

sample1 <- SubsetData(Subset_Cells, subset.name = "sample", accept.value = "Patient_1")
sample1 <- ScaleData(sample1)
sample1 <- NormalizeData(sample1, verbose = FALSE)
sample1 <- FindVariableFeatures(sample1, selection.method = "vst", nfeatures = 2000)

After run normalization and findvariablefeatures, I have the following error and warning.
(Error: Cannot add a different number of cells than already present)
Warning messages:
1: In FindVariableFeatures.Assay(object = assay.data, selection.method = selection.method, :
selection.method set to 'vst' but count slot is empty; will use data slot instead
2: In eval(predvars, data, env) : NaNs produced
3: In hvf.info$variance.expected[not.const] <- 10^fit$fitted :
number of items to replace is not a multiple of replacement length

I have the same error and warnings with every sample that I subset, then when I try to run FindIntegrationAnchors, I have the following error
(Error in nn2(data = cn.data2[nn.cells2, ], query = cn.data1[nn.cells1, :
Cannot find more nearest neighbours than there are points)

Thanks,

Hugo

Source

HugoGonzalezVelozo

👍1

Most helpful comment

We do not support the identification of variable features on integrated data. If you want to subset and recluster using a new set of variable genes, you need to switch the assay of the subsetted to the 'RNA' assay.

satijalab on 17 May 2019

👍2

All 8 comments

satijalab on 17 May 2019

👍2

Thanks, can you give an example how properly do it? I have a sample that have multiples cell types, such as, epithelial cells, immune cells and endothelial cells. What I want to do, is subset and re-cluster just the immune cells.

Thanks.

HugoGonzalezVelozo on 17 May 2019

So is it not recommended to subset off of an integrated object and then re-run FindVariableFeatures? Instead, is it recommended to run clustering, subset the cells of interest, and then run integration off the subsetted object? Just want to be sure I understand what was stated above. Thanks!

dwucsf on 9 Jun 2019

Yes, although is still matter of debate, after integration you can subset and re-cluster, alternatively, you can merge your samples and run default clustering, then subset a group of cells of interest and run integration, I prefer the second strategy, it works better. I'm also in UCSF if you want to chat in person, you will find me in the UCSF directory.

Hugo

HugoGonzalezVelozo on 13 Jun 2019

when I run subset on the integrated data, after re-clustering, the cells were clustered based on the orig.ident instead of similarity. so there must be batch effect. but I also have done scale data. so how to solve this problem. thanks!

hehedidi on 24 Jun 2019

Thanks, can you give an example how properly do it? I have a sample that have multiples cell types, such as, epithelial cells, immune cells and endothelial cells. What I want to do, is subset and re-cluster just the immune cells.

Thanks.

@HugoGonzalezVelozo

Thank you for asking the question - I want to do exactly the same thing.
Have you solved the problem?

I found when I do
DefaultAssay(subset_object) <- "RNA"
then take Approach 2, it does not give the error message.

Considering @satijalab reply, I thought this was the way to go.
I would be grateful for any thoughts/advice.

Best,
Shoko