Seurat: Count number of cells expressing a gene

Created on 19 Mar 2018  路  16Comments  路  Source: satijalab/seurat

Hi Seurat team,

I was wondering if you could show me how can I calculate the number of cells expressing the given genes.

For example, I have the violin plot for 3 different genes below. I wanted to add the information of the number of cells expressing those genes for each cluster.

https://imgur.com/a/WzMjq

How can I do that? Are there any function in seurat which can do this?

Thank you very much.

Most helpful comment

@brianpenghe @brunetc
I wrote a small script based on functions in Seurat V3.0. It calculates percentage of total cells expressing a gene (raw counts > 0) by metadata groups. Let me know if you have suggestions.
example:

PrctCellExpringGene(seurat_object ,genes =c("geneA","geneB"), group.by = "sample.ID")
Markers Cell_proportion
MS1.1 geneA 0.022727273
MS1.2 geneB 0.045337995
MS2.1 geneA 0.000000000
MS2.2 geneB 0.030033951
MS3.1 geneA 0.001821125
MS3.2 geneB 0.035410765

# updated 1/31/2020 to accommodate V3.1
# updated 2/4/2020 to output "NA" for genes not detected in certain subgroups

PrctCellExpringGene <- function(object, genes, group.by = "all"){
    if(group.by == "all"){
        prct = unlist(lapply(genes,calc_helper, object=object))
        result = data.frame(Markers = genes, Cell_proportion = prct)
        return(result)
    }

    else{        
        list = SplitObject(object, group.by)
        factors = names(list)

        results = lapply(list, PrctCellExpringGene, genes=genes)
        for(i in 1:length(factors)){
        results[[i]]$Feature = factors[i]
        }
        combined = do.call("rbind", results)
        return(combined)
    }
}

calc_helper <- function(object,genes){
    counts = object[['RNA']]@counts
    ncells = ncol(counts)
    if(genes %in% row.names(counts)){
    sum(counts[genes,]>0)/ncells
    }else{return(NA)}
}

All 16 comments

We don't include a native function for this in Seurat, though it would be easy to do this on your own if you were interested. You could pull out your gene of interest (for example object@data["Elk1",]), and count the non-zero entries.

I see.
Yes, I am doing that but to count them by clusters it's quite the hassle.
That's fine then, I will stick to my current method.
Thank you for your reply.

Hi, I've been trying to find a way to get the percentage (number) of cells expressing specific genes. Should I retrieve it from the features list? Is there a native function to get these numbers in Seurat v3.0?
Thanks!

Hi, I've been trying to find a way to get the percentage (number) of cells expressing specific genes. Should I retrieve it from the features list? Is there a native function to get these numbers in Seurat v3.0?
Thanks!

I was using their FindMarkers function with a loose threshold to calculate pct.
The only issue is it may not give you all the genes you want

@brianpenghe @brunetc
I wrote a small script based on functions in Seurat V3.0. It calculates percentage of total cells expressing a gene (raw counts > 0) by metadata groups. Let me know if you have suggestions.
example:

PrctCellExpringGene(seurat_object ,genes =c("geneA","geneB"), group.by = "sample.ID")
Markers Cell_proportion
MS1.1 geneA 0.022727273
MS1.2 geneB 0.045337995
MS2.1 geneA 0.000000000
MS2.2 geneB 0.030033951
MS3.1 geneA 0.001821125
MS3.2 geneB 0.035410765

# updated 1/31/2020 to accommodate V3.1
# updated 2/4/2020 to output "NA" for genes not detected in certain subgroups

PrctCellExpringGene <- function(object, genes, group.by = "all"){
    if(group.by == "all"){
        prct = unlist(lapply(genes,calc_helper, object=object))
        result = data.frame(Markers = genes, Cell_proportion = prct)
        return(result)
    }

    else{        
        list = SplitObject(object, group.by)
        factors = names(list)

        results = lapply(list, PrctCellExpringGene, genes=genes)
        for(i in 1:length(factors)){
        results[[i]]$Feature = factors[i]
        }
        combined = do.call("rbind", results)
        return(combined)
    }
}

calc_helper <- function(object,genes){
    counts = object[['RNA']]@counts
    ncells = ncol(counts)
    if(genes %in% row.names(counts)){
    sum(counts[genes,]>0)/ncells
    }else{return(NA)}
}

You can retrieve the number of cells expressing a gene my_gene from a Seurat object my_object in this way:

sum(GetAssayData(object = my_object, slot = "data")[my_gene,]>0)

The percentage of cells expressing that gene would be:

sum(GetAssayData(object = my_object, slot = "data")[my_gene,]>0)/nrow([email protected])

The object [email protected] can be used to eventually extract cells (rownames) with a specific Identity or belonging to a specific cluster if you performed that analysis before, etc. After you have obtained a subset of cells of interest (my_cells) you can obtain the same as above by passing them as colnames:

sum(GetAssayData(object = my_object, slot = "data")[my_gene,my_cells]>0)
sum(GetAssayData(object = my_object, slot = "data")[my_gene,my_cells]>0)/length(my_cells)

One way to get counts of cells with values > 0 for each cluster for a gene of interest is to use the data generated by calling VlnPlot.

For example:

library(dplyr)
p <- VlnPlot(object = my_object, features =c("my_gene"))
p$data %>% group_by(ident) %>% summarize(counts = sum(my_gene, na.rm = TRUE))

While not an elegant solution, does the trick for at least one gene of interest at time across clusters.

FWIW, the function posted by @Ryan-Zhu worked great for me to get total number of cells per "orig.ident" condition for my gene of interest.

Hi, I've been trying to find a way to get the percentage (number) of cells expressing specific genes. Should I retrieve it from the features list? Is there a native function to get these numbers in Seurat v3.0?
Thanks!

I was using their FindMarkers function with a loose threshold to calculate pct.
The only issue is it may not give you all the genes you want

@brianpenghe This is quite a clever trick. I am wondering why you think that it may not give all the genes. I tried it with entirely permissible thresholds: min.pct = 0, logfc.threshold = 0 and got data for 16k genes. However, total number of genes in the two groups which were compared is 20k.
Note: The find marker task was performed on a sub cluster which may not have expression values for all the genes present in the original un-clustered data.

@Ryan-Zhu I like the function you wrote. It will be much better if we can group.by="seurat_clusters"
Thanks!
Yale

@Ryan-Zhu I like the function you wrote. It will be much better if we can group.by="seurat_clusters"
Thanks!
Yale

group.by works with any columns that are present in your metadata. So if your YourObject$seurat_clusters is not NULL you can definitely do group.by="seurat_clusters".

@Ryan-Zhu Thanks for your quick reply. When we run the function using "seurat_clusters" we receive the following error. Even if we run Idents(Object) <- Object$seurat_clusters prior to running:

`Error in object[[group.by]] == x :
comparison of these types is not implemented
In addition: Warning message:
In cells %||% colnames(x = object) :

Error in object[[group.by]] == x :
comparison of these types is not implemented`

As verification we have "seurat_clusters" as a meta data column:

`> table(Object$seurat_clusters)

1 2 3 4 5 6 7 8 9 10 11 12 13
7928 7055 5367 5189 5007 4371 3921 1507 1322 709 672 571 511 `

Any suggestion?

Thanks so much!

Yale

@happar73 The error basically says object[[group.by]] and x are not the same datatypes and are not comparable. Please try my updated version.

@Ryan-Zhu The updated one works. Thanks for your quick reply and thanks for your great help. I really appreciate it.

Many thanks,
Yale

@Ryan-Zhu Thanks so much for this I work with Yale. This has saved us a great deal of work and it is a great function.

Is it possible to create a version that will report 2 meta data tables? For instance we have metadata column for "stim" (VEH, DRUG1, DRUG2, DRUG3, DRUG4). Does the function have the ability to report a table of cluster + stim?

@cpcook1 You can first split your object by "stim" and then run the function on the sub-objects individually.

stim.list <- SplitObject(YourObject, "stim")
results <- lapply(stim.list, PrctCellExpringGene, genes=genes, group.by="seurat_clusters")

@Ryan-Zhu
Hi,
How can I get the percent of cells expressing two genes (double positive cells for example KRT18+/APOE+)?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

GHAStVHenry picture GHAStVHenry  路  3Comments

sarahwajid picture sarahwajid  路  3Comments

farhanma picture farhanma  路  3Comments

tmccra2 picture tmccra2  路  3Comments

Biogitte picture Biogitte  路  3Comments