Seurat: Question: Scran-normalized data into Seurat

Created on 2 Sep 2018  ·  8Comments  ·  Source: satijalab/seurat

I am analyzing a single cell RNA-seq dataset where one of the features is that one of the cell types of interest has notably lower RNA content than other cell types, and this difference is of biological importance. To be able to (approximately) study differences in gene expression while accounting for this, we were hoping to normalize our data to our ERCC spike-ins.

As an approach, I am trying to use scran, which does provide size factors based on ERCCs. I want to be able to take this normalized matrix and input into Seurat. Suppose i have this normalized matrix norm. Is it sufficient for me to do pbmc@data@x = as.numeric(log(norm + 1))? Or am I misunderstanding how the normalized data is stored in Seurat?

Analysis Question

Most helpful comment

Also, if the scran normalized data is log transformed, make sure that the values are in natural log, and not log2. Seurat assumes that the normalized data is log transformed using natural log (some functions in Seurat will convert the data using expm1 for some calculations).

Best,
Leon

All 8 comments

Hi @skannan4,

You have to replace your object@data slot with the desired gene expression matrix as follows:

pbmc@data = log(x = norm + 1))

Two details worth considering:

  1. After doing this, you will loose the data normalized through Seurat. But if you want to keep it you can always store it in object@misc as follows:
pbmc@misc[["seurat_data"]] <- as.matrix(x = pbmc@data)
  1. Make sure that the output of scran is not log transformed before computing log values.

Best,
Leon

Also, if the scran normalized data is log transformed, make sure that the values are in natural log, and not log2. Seurat assumes that the normalized data is log transformed using natural log (some functions in Seurat will convert the data using expm1 for some calculations).

Best,
Leon

Hi,

I am writing to seek your help with using a TMM normalized input matrix as input to create Seurat object.

I get an error following the example in the code below to use zinbwave function:
[(https://github.com/drisso/zinbwave/issues/17)]

se <- SummarizedExperiment(as.matrix(input)) # put in the colData() part of the object at least batch
zinb <- zinbwave(se[[email protected],],K=10, epsilon=1000)
Error in .local(Y, ...) : 
  The input matrix should contain only whole numbers.

Can you please help with this error.

Thanks
Sharvari

Also, if the scran normalized data is log transformed, make sure that the values are in natural log, and not log2. Seurat assumes that the normalized data is log transformed using natural log (some functions in Seurat will convert the data using expm1 for some calculations).

Best,
Leon

Thank you for this information, I would like to know which function of Seurat will use expm1?

Thank you in advance!

Best,
Pernille

Hi @PernilleYR, a lookup of the function name reveals where it is used:

https://github.com/satijalab/seurat/search?q=expm1&type=Code

src/data_manipulation.cpp: FastCov(), FastExpMean(), FastLogVMR()
R/utilities.R: ExpVar(), ExpSD(), ExpMean(), AverageExpression(), AddSmoothedScore()
R/differential_expression.R: FindMarkers()
R/plotting.R: DotPlot()

So it seems it is pretty widespread used and also be aware of indirect calls by functions that are not listed.

Best wishes

Hi @skannan4, based on @leonfodoulian 's snippets this works for me (_data_ is the Seurat object and _sce_ is the SingleCellExperiment object):

Normalize with scran

sce <- SingleCellExperiment(assays = list(counts = as.matrix(x = data@data))) # read data from Seurat
clusters = quickCluster(sce, min.size=100)
sce = computeSumFactors(sce, cluster=clusters)
sce = normalize(sce, return_log = FALSE) # without(!) log transform

Normalize with Seurat (backup elsewhere) and replace with scran normalization

data = NormalizeData(object = data, normalization.method = "LogNormalize", scale.factor = 10000)
data@misc[["seurat_norm_data"]] = as.matrix(x = data@data) # backup Seurat's norm data
data@data = log(x = assay(sce, "normcounts") + 1)

I am trying to do the same thing as @skannan4, but there seems to be a problem with classes. class(norm) gives

[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"

but when I try to replace the original matrix with pbmc[["RNA"]]@data = log(x = norm + 1) I get an error stating

Error in (function (cl, name, valueClass) :
assignment of an object of class “dgeMatrix” is not valid for @‘data’ in an object of class “Assay”; is(value, "AnyMatrix") is not TRUE

which is confusing, because is(norm, "AnyMatrix) gives me TRUE. I did a quick google search and apparently, BioC is built on S4 where the sparse matrix is dgeMatrix and Seurat is built on S3 where the sparse matrix is dgCMatrix. I tried converting the scran output matrix back and forth with

library(Matrix)
norm = as.matrix(norm)
norm = Matrix(norm, sparse=TRUE)

but no success.

Does anybody have an idea for a workaround? @tilofreiwald @andrewwbutler @sharvari14 @satijalab

Hi @marcmuellerETHZ, I did not experience this error myself but you could try two things:

  • make sure your row/column names are preserved during your conversions
  • specifically cast a dgCMatrix before you update the Seurat object: M1 <- as(m, "dgCMatrix")
Was this page helpful?
0 / 5 - 0 ratings