Seurat: Cell Ranger “Aggr” read-depth normalization vs. Seurat NormalizeData

Created on 1 Aug 2018 · 1Comment · Source: satijalab/seurat

Hi,

I have multiple 10X scRNA-seq libraries to be combined for Seurat analysis, and was wondering which approach is best to normalize for differences in sequencing read depth per library?

Option A: use Cell Ranger's "aggr", which subsamples reads from higher-depth libraries until all libraries have an equal number of confidently mapped reads per cell.

Option B: use Seurat's NormalizeData, which (if I understand correctly) normalizes the expression of each gene within a cell by the total expression within that cell. On top of that, regressing out UMI can further eliminate any depth-dependent effect that was not removed by NormalizeData.

My questions are:

1) Is it generally sufficient to apply option B without option A? I would prefer to avoid option A if possible because there is a tremendous loss of reads after subsampling.

2) Aside from the problem of losing reads, is it ok to combine options A and B? I assume the two approaches are complementary in theory, but please let me know if they cannot be applied together.

Thank you!

Source

CMC-LI

Most helpful comment

They can certainly be applied together, but we do not generally suggest option A - as this does have the potential for discarding a lot of data - as you suggest.

We are moving towards support for an alternative preprocessing strategy, based on regularized negative binomial regression - which aims to correct for depth-dependent biases without sacrificing biological distinctions between cell types. You can read more (and try things out) here:
https://github.com/ChristophH/sctransform

We note that most results remain similar with standard log-normalization, but do show improvement, and are built of a statistical framework that does not assume equal molecular contents per cell.

satijalab on 3 Aug 2018

👍6

>All comments

They can certainly be applied together, but we do not generally suggest option A - as this does have the potential for discarding a lot of data - as you suggest.

We note that most results remain similar with standard log-normalization, but do show improvement, and are built of a statistical framework that does not assume equal molecular contents per cell.

satijalab on 3 Aug 2018

👍6

Was this page helpful?

0 / 5 - 0 ratings