Seurat: using object@data versus [email protected]

Created on 24 Apr 2017 · 3Comments · Source: satijalab/seurat

With RegressOut():

Seurat stores the z-scored residuals of these models in the scale.data slot, and they are used for dimensionality reduction and clustering.

Although PCA() and ICA() use [email protected], it looks like RunTSNE() usesobject@data. Am I reading the code wrong or should that really be the case? Shouldn't both use the same values?

In general, it seems that a lot of functions use object@data instead of [email protected]. For example, FindMarkers() and AverageExpression(). Shouldn't most downstream functions use the scaled data?

Source

igordot

👍3

Most helpful comment

Hi, I don't understand why using [email protected] for FindMarkers is inappropriate. Two cell groups from different libraries could have very different sequencing depth. Why is using object@data default mode?

lixin4306ren on 19 Jun 2018

👍6

All 3 comments

In general, we use [email protected] for functions that identify structure in the data, such as dimensionality reduction, as this will tend to give lowly and highly expressed genes equal weight. Values in [email protected] can therefore be negative, while values in object@data are >=0.

For FindMarkers and AverageExpression, we want to either discover DE genes or compute in silico cluster averages, so using [email protected] would be inappropriate.

You are right that RunTSNE should have the option to run on scale.data (in most cases we don't compute tSNE on gene expression values, so this is a moot point). We will fix in an upcoming release.

satijalab on 12 May 2017

👍1

lixin4306ren on 19 Jun 2018

👍6

I'd like to hear the answer to lixin4306ren's question as well.

Along the same lines, what is your recommendation on which data type to use when one wants to apply quantitative filtering based on gene expression? For instance, I'd like to gate on CD3+CD4+ cells and to do that, I extracted a data.frame of expression values with FetchData() function. Within this data frame, I calculated boolean tests on raw.data for adding new metadata information to appropriate cells (e.g. If CD3 > 0 and CD4 >0 annotate it as "CD4+ T-cell")

Would it be more appropriate here to use scaled.data?