Seurat: Defining cell types with canonical markers

Created on 23 Nov 2017  路  7Comments  路  Source: satijalab/seurat

Hi,

I am trying to use a list of genes as canonical markers for several cell types that are present in the data. However, these markers are not in the top10 gene list for clusters as obtained from FindAllMarkers output. These cell types (each having 4-5 signature genes) are dispersed and not segregating based on the clustering. Is there a way to use this signature gene list to identify the cell types and the use that information to label the tSNE?

Thanks,

Bibaswan

Most helpful comment

The easiest is to simply create a single character vector with all the genes. Otherwise you can combine the individual vectors when passing them to pc.genes, using c(vector_1, vector_2, ...). The idea is to use all these genes for dimensionality reduction and clustering. Also, make sure to adjust the pcs.compute argument in RunPCA() depending on the number of genes you are passing to the function.

Best,
Leon

All 7 comments

Hello Bibaswan,

I guess you should consider a supervised clustering approach in your case. To do that, you can create a vector containing those genes in as.character(), and pass this vector to the pc.genes argument in the RunPCA() function, before computing FindClusters() and RunTSNE().

Best,
Leon

Thanks Leon. I should have 9 cell types each with signature genes. Should I put the pc.genes argument for all the genes together in separate character vectors?

The easiest is to simply create a single character vector with all the genes. Otherwise you can combine the individual vectors when passing them to pc.genes, using c(vector_1, vector_2, ...). The idea is to use all these genes for dimensionality reduction and clustering. Also, make sure to adjust the pcs.compute argument in RunPCA() depending on the number of genes you are passing to the function.

Best,
Leon

Hi Leon,

Thanks again. Just to clarify, how does the pcs.compute vary with the number of genes in the pc.genes?

Bibaswan

Hello Bibaswan,

If you input only 10 genes for PCA dimensionality reduction, you cannot compute 20 PCs, as PCA is used to reduce the dimension of your input data. That is, the number of input genes should be higher than the number of PCs to be computed.

Best,
Leon

Okay understood. Thanks a lot.

Hi!!
Thanks for asking this question ghoshal.
I am actually trying a supervised clustering approach to identify subpopulations.

I did create a single character vector with all the canonical markers, then run RunPCA, FindClusters and RunTSNE with that gene list instead of genes.use (for FindClusters and RunTSNE) or pc.genes (for RunPCA).

Then when I run FindMarkers function, I get:
_Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE) : invalid character indexing_ with a vector list

or get this:
_Error in data.use[genes.use, cells.1, drop = F] : invalid or not-yet-implemented 'Matrix' subsetting_
with a .txt of the same list of genes.

What should I do to run the FindMarkers function?
thanks all!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

htc502 picture htc502  路  3Comments

mvalenzuelav picture mvalenzuelav  路  3Comments

kysbbubbu picture kysbbubbu  路  3Comments

rajasreemenon picture rajasreemenon  路  3Comments

bio-la picture bio-la  路  3Comments