Seurat: Single cell RNA, DGE by markers, minimum pct, expression cut-off (Seurat),

Created on 26 Jul 2019 · 13Comments · Source: satijalab/seurat

@satijalab , would appreciate your response on the issues below. I have searched the GitHub page and bioinformatics blogs for responses but none were able to solve my issues:

I have 4 single cell RNA Seq samples (S1 to S4). :
I combined data after subletting to get rid of mitochondrial samples etc using:

immune.anchors <- FindIntegrationAnchors(object.list = list(S1, S2, S3, S4), dims = 1:20) immune.combined <- IntegrateData(anchorset = immune.anchors, dims = 1:20)

Then, I have the following issues:

1) If I want to run DGE for S1 vs S2 and S1 vs S3, and S1 vs S4 based on a MARKER, not cluster, how can I achieve that? Let's assume the marker of interest is CD8A.

CD8A<-WhichCells(object=immune.combined, expression = CD8A > 1) #random parameters
Based on the command above, I am able to find all the cells with CD8A expression >1. But, is >1 is the correct # to use? how is this selected usually?

Also, after finding CD8A population, how can I compare the DGE of CD8A in S1 vs S2, S1 vs S3, and S1 vs S4?

The typical command of FindMarkers uses ident.1 and ident.2. I am unable to identify those based on CD8A as in ident1 and ident2 it is requesting the already defined clusters.

2) How can I identify cell with multiple parameter: example: CD3+ve and CD8A +ve? I can again use WhichCells command, but how do I set up my cut-offs?

3) When looking at DGE using FindMarkers for 2 defined clusters, I got:

                   p_val  avg_logFC pct.1 pct.2  p_val_adj
CD22         1.788744e-06 -1.0944121 0.000 1.000 0.03170190
BANK1        1.788744e-06 -2.3095188 0.000 1.000 0.03170190
LAPTM5       1.788744e-06 -2.4666245 0.000 1.000 0.03170190
HLA-DPB1     1.788744e-06 -3.2045240 0.000 1.000 0.03170190
RPS9         4.160599e-06 -0.9319087 0.944 1.000 0.07373830
CD74         4.160599e-06 -1.1963907 1.000 1.000 0.07373830

the pct.1 is 0 for the first 4 genes vs 1 for pct.2. Does this mean that this is an inaccurate DGE? does 0 mean it is not expressed/detectted at all due to experimental error? Or is this corrected for that and hence it is truly not expressed?

Source

cBioPort

Most helpful comment

@aditisk since your feature names contain - you need to wrap the name in backticks, for example:

Idents(cd8_pbl, WhichCells(object = cd8_pbl, expression = `PD-1-TotalSeqC` > 1, slot = 'data')) <- 'PD1.pos'

timoast on 26 May 2020

👍2

All 13 comments

To perform differential expression on cells grouped by the expression of a gene, first create a new set of cell identities based on the expression of the gene/s. For example:

Idents(pbmc_small, WhichCells(object = pbmc_small, expression = CD8A > 1, slot = 'data')) <- 'cd8.pos'
Idents(pbmc_small, WhichCells(object = pbmc_small, expression = CD8A <= 1, slot = 'data')) <- 'cd8.neg'

You can then find differentially expressed genes between CD8+/- cells in the usual way:

genes <- FindMarkers(pbmc_small, ident.1 = 'cd8.pos', ident.2 = 'cd8.neg')

To identify cells using multiple expression parameters you would just specify all the parameters in WhichCells as shown above, eg CD8A > 1 & CD4 < 1. Unfortunately we cannot advise how to choose expression level cutoffs.

From the FindMarkers documentation:

pct.1: The percentage of cells where the gene is detected in the first group
pct.2: The percentage of cells where the gene is detected in the second group

So if pct.1 is 0, then no cells expressed that gene in the first group.

timoast on 26 Jul 2019

👍1

Thank you for the quick reply. The above approach/command will compare CD8positive vs Cd8negative across all my data which is combined using:

` immune.anchors <- FindIntegrationAnchors(object.list = list(S1, S2, S3, S4), dims = 1:20)'
combined <- IntegrateData(anchorset = immune.anchors, dims = 1:20)'

This data constitute of 4 samples.

How can I compare CD8pos from S1 to CD8pos from S2?

Should I be using the following:

`Idents(S1, WhichCells(object = S1 expression = CD8A > 1, slot = 'data')) <- 'cd8.posS1'
Idents(S2, WhichCells(object = S2, expression = CD8A <= 1, slot = 'data')) <- 'cd8.posS2'

What I am not certain of is whether using the original S1 instead of combined data may change results. Can I just subset not just by marker, but also by the sample # from the idents? That I couldn't figure out.

I was reading documentation on "slot" from WhichCells command and it was not clear to me what other options than "data" to use

Last, thanks for clarifying that there are no expression parameters that are set. however, can I see what is the distribution of expression of some of the genes? Like for CD8A, I want to see what's the range and how the distribution is across the different cells and samples.

thanks again!

cBioPort on 26 Jul 2019

You can combine the identities to include both the CD8 expression and the sample. For example, if your sample ID was stored as "sample" in the metadata:

Idents(pbmc_small, WhichCells(object = pbmc_small, expression = CD8A > 1, slot = 'data')) <- 'cd8.pos'
Idents(pbmc_small, WhichCells(object = pbmc_small, expression = CD8A <= 1, slot = 'data')) <- 'cd8.neg'
Idents(pbmc_small) <- paste0(Idents(pbmc_small), pbmc_small$sample)

You can then find DE between CD8+ between sample 1 and 2 using:

FindMarkers(pbmc_small, ident.1 = 'cd8.pos.S1', ident.2 = 'cd8.pos.S2')

The slots available are counts, data, or scale data. Differential expression is typically performed on the normalized data (stored in the data slot).

If you want to check the distribution of the expression of genes, you can plot them using VlnPlot

timoast on 2 Aug 2019

Is it possible to subset your data, similar to the cd8+/- shown above and do a pca/tsne analysis on this subset? My goals are to extract other transcripts present in a cell type, and assess whether there are "sub-clusters" in that population of cells.

My first attempt only returns a tsne plot with either a ".pos" or ".neg" label.

genes <- FindMarkers(EB1, ident.1 = 'Nanos2.pos', ident.2 = 'Nanos2.neg')
s <- RunTSNE(EB1, idents="Nanos2.pos")
DimPlot(s, reduction = "tsne")

stephanyfoster on 17 Aug 2019

Hi @timoast, will the cell identity creation you mentioned above work with antibodies in the ADT dataset ? I'm trying to make groups based on antibody expression rather than gene expression so I replaced CD8A in the expression argument to "adt_CD8A" but it gives me an error:

Error in FetchData(object = object, vars = unique(x = expr.char[vars.use]), :
None of the requested variables were found:

Thanks in advance for your help.

aditisk on 24 May 2020

So you kidding? all features should be in rownames(yourseuratobject)

tuqiang2014 on 24 May 2020

@tuqiang2014 none of my ADT antibody names are in rownames(seuratobject). My Seurat object has 2 assays as shown below:

$RNA
Assay data with 33545 features for 34875 cells
Top 10 variable features:
CXCL13, CCL4, CCL4L2, IL22, S100A8, IL17A, LYZ, GNLY, CCL3, S100A9

$ADT
Assay data with 6 features for 34875 cells
First 6 features:
CD4-TotalSeqC, CD8-TotalSeqC, PD-1-TotalSeqC, TIM3-TotalSeqC, CD39-TotalSeqC,
CD103-TotalSeqC

How can I alter the WhichCells() command to work with these features ? I couldn't find an argument to specify which assay to use, maybe I am missing something.

Thank you.

aditisk on 24 May 2020

you may mean choice a assays to analyze. before run WhichCells() ,you can see ?Seurat::DefaultAssay,and run DefaultAssay(seuratobject) <- "ADT" to select assays . Hope that this was useful.

tuqiang2014 on 25 May 2020

I set the default assay to ADT and my rownames now show the antibody names. Despite of this I am still getting the same error.

rownames(cd8_pbl)
[1] "CD4-TotalSeqC" "CD8-TotalSeqC" "PD-1-TotalSeqC" "TIM3-TotalSeqC"
[5] "CD39-TotalSeqC" "CD103-TotalSeqC"

Idents(cd8_pbl, WhichCells(object = cd8_pbl, expression = PD-1-TotalSeqC > 1, slot = 'data')) <- 'PD1.pos'
Error in FetchData(object = object, vars = unique(x = expr.char[vars.use]), :
None of the requested variables were found:

aditisk on 25 May 2020

I look at you use slot = 'data',So you run ScaleData() before?

InstallData('cbmc') 
cbmc.SeuratData::cbmc -> cbmc
DefaultAssay(cbmc) <- "ADT"
cbmc <- cbmc %>% NormalizeData() %>% ScaleData()
Idents(cbmc,WhichCells(cbmc,expression = CD11c >1 ,slot='data')) <- "myIdents"

tuqiang2014 on 25 May 2020

@aditisk since your feature names contain - you need to wrap the name in backticks, for example:

Idents(cd8_pbl, WhichCells(object = cd8_pbl, expression = `PD-1-TotalSeqC` > 1, slot = 'data')) <- 'PD1.pos'

timoast on 26 May 2020

👍2

@timoast thanks a lot, it works now.

@tuqiang2014 thanks for your help too.

aditisk on 26 May 2020

@aditisk since your feature names contain - you need to wrap the name in backticks, for example:
Idents(cd8_pbl, WhichCells(object = cd8_pbl, expression = `PD-1-TotalSeqC` > 1, slot = 'data')) <- 'PD1.pos'

Does not work out for me:

> mysample
An object of class Seurat 
16982 features across 8361 samples within 1 assay 
Active assay: RNA (16982 features, 2000 variable features)
> str(WhichCells(mysample, expression = `mygene` > 1000))
Error in CellsByIdentities(object = object, cells = cells) : 
  Cannot find cells provided
> str(WhichCells(mysample, expression = `mygene> 1000`))
Error in eval_tidy(expr = expr, data = data.subset) : 
  object 'mygene > 1000' not found
> str(WhichCells(mysample, expression = mygene > 1000))
Error in CellsByIdentities(object = object, cells = cells) : 
  Cannot find cells provided