Hi everyone!
I am using some small gene signatures (3-5 genes) to identify populations in my dataset using FeaturePlot(). I would like to get a plot like this one https://www.nature.com/articles/s42003-020-0922-4/figures/1 (Figure 1c), but so far I have not managed to sort it out.
I have already checked the Seurat visualization vignette, the option for 2 genes mentioned in #1343 (not suitable for more than 2 genes) and the average mean expression mentioned in #528. This last option would be fine, but I get a lot of noise in clusters that are unimportant for my signature because i.e. I get 3 very specific markers that are expressed only in my cluster of interest, but the 4th one is expressed in high levels in this specific cluster but in low levels in the rest of the clusters. When I plot I obtain a messy FeaturePlot where all the clusters look similar.
Is there a way to obtain this kind of plot using Seurat, or do I need another package? Thank you in advance!
Hi,
Not member of dev team but hopefully can be helpful. I would try and see if using module scores would accomplish what you are trying to do. AddModuleScore can be used to see if expression of given gene set is enriched vs set of randomly selected (but based on expression bins) control genes. This might help to clean up the plot as it sounds like the enrichment of the whole gene set would likely be cell type specific whereas one particular gene might also be expressed in other cell types.
Best,
Sam
A basic workflow would be something like:
cell_typeA_marker_gene_list <- list(c("Gene1", "Gene2", "Gene3", "Gene4"))
object <- AddModuleScore(object = object, features = cell_typeA_marker_gene_list, name = "cell_typeA_score")
FeaturePlot(object = object, features = "cell_typeA_score1")
Note that gene list for AddModuleScore must be supplied in list class or coerced as part of function to list otherwise score will be created for each gene and not the gene set.
Also note that you must add "1" to the end of whatever name you provided in AddModuleScore as this is automatically appended to the name provided in the new metadata column which stores the score.
Thank you very much @samuel-marsh ! I think you solved my problem. Could this approach be used for larger gene signatures? What are the limitations that I should be aware of?
HI,
You definitely can use for larger gene sets but I think getting too big could be an issue for couple reasons. For one, given the sparsity of single cell data if your gene set is too large then it may not appear enriched via module score when biologically it may be. The inverse is of course also true if your gene set is too small, then the number of cells with enriched scores may be too high. If you want to know more about the method I suggest reading the original paper and methods that the function is derived from (Tirosh et al., 2016; see manual entry for more details).
Best,
Sam