I am not sure if this is a proper question for the repo as it is not "technical". Apologize if I should ask somewhere else.
Basically, I found it is hard to tune parameters to get "clean" clusters for our data using the default seurat pipeline (selecting HVG --> PCA --> clustering --> tSNE).
After i got clusters, i still uses a set of "marker genes" to recognize cell populations, and sometimes hard for some clusters (especially hard for subclusters).
I wonder why seems no one is performing "supervised" clustering, e.g. for immune cells using a set of genes (let's say 100 genes or so) to do clustering?
Thanks a lot,
Shuoguo
Supervised clustering is indeed possible, and, in my experience, I have used the "pc.genes" parameter to input a subset of genes at the PCA stage, and then percolate that downstream to do the tSNE. I felt, this way, the PCs would reflect the subset of genes that are of interest.
You can also use the "genes.use" option while running tSNE.
Anybody from the Seurat team, please let us know if this a good way to do this, or if there is a better way.
I used the "pc.genes" parameter to input a subset of genes at the PCA stage,
gene.list <- as.character("Pdfr","dimm","Oa2","sNPF")
all(gene.list %in% rownames(seurat@data))
object <- RunPCA(object = seurat, pc.genes = gene.list, pcs.compute = 4,
do.print = TRUE,pcs.print = 1:5,
genes.print = 5)
but I got the error showing below,
Error in apply(X = data.use[genes.use, ], MARGIN = 1, FUN = var) :
dim(X) must have a positive length
it's also the same if I use the "genes.use" option while running tSNE, how can we fix this issue? Thank you in advance.