Seurat: How to remove genes from Seurat object after scaling data

Created on 10 May 2018  路  3Comments  路  Source: satijalab/seurat

Hi all,
I have ERCC controls in my UMI dataset, and I used them for cell filtering and also for scaling data (apart from nUMI and mito genes). In first place, am I doing well or is it not necessary? If they were not useful for this, I do not really know the function of these control ERCCs.
On the other hand, I proceeded with downstream analysis with all genes (including ERCCs) but they are appearing as cluster markers in some populations and I do not want this to happen. How can I remove them from the Seurat object after doing the scaling? If they were not useful for filtering and scaling, I know how to remove them from the initial matrix, but if they were... Can I use SubsetData function, giving as susbset.name argument a vector with ERCC names? It is giving me an error and I think SubsetData is for subsetting cells, right?:

#This is the script I used to calculate initial % of ERCC
ERCC.WT <- grep(pattern = "^ERCC-", x = rownames(x = data.WT@data), value = TRUE) 
percent.ERCC.WT <- Matrix::colSums([email protected][ERCC.WT, ])/Matrix::colSums([email protected]) 
data.WT <- AddMetaData(object = data.WT, metadata = percent.ERCC.WT, col.name = "percent.ERCC")

#And after normalized and scaled data, I tried:
data.WT <- NormalizeData(data.WT)
data.WT <- ScaleData(data.WT, vars.to.regress = c("nUMI", "percent.mito", "percent.ERCC"))
data.WT.subset <- SubsetData(data.WT, subset.name = ERCC.WT)

Error in WhichCells(object = object, ident = ident.use, ident.remove = ident.remove,  : 
  subset.name must be a single parameter

Another doubt has now come to me: Should I also remove mito genes in case they label some clusters too? I understand their expression constitute signals of low quality cells (broken), but I would not want to manipulate data.

Many thanks in advance!
Marina

Analysis Question

Most helpful comment

Hello Marina,

Excluding gene counts from the Seurat object cannot be performed, and I am not sure if this functionality is ever implemented in one of Seurat's functions (see issue #274 for more details). I would suggest you calculate ERCC abundances before creating your Seurat object, on the raw count matrix. You can then add these values to the [email protected] slot under any name while creating the object with a subset of your raw data (i.e. without the ERCC genes), as follows:

# Calculate ERCC abundances on the raw counts before creating a Seurat object
ERCC.WT.index <- grep(pattern = "^ERCC-", x = rownames(count.data), value = FALSE) # Select row indices and not ERCC names 
percent.ERCC.WT <- Matrix::colSums(count.data[ERCC.WT.index, ])/Matrix::colSums(count.data)

# Remove ERCC from count.data
count.data <- count.data[-ERCC.WT.index, ]

# Create Seurat object, and add percent.ERCC.WT to [email protected] in the percent.ERCC column
data.WT <- CreateSeuratObject(raw.data = count.data, meta.data = data.frame(percent.ERCC = percent.ERCC.WT))

Hope this helps!

Best,
Leon

All 3 comments

Hello Marina,

Excluding gene counts from the Seurat object cannot be performed, and I am not sure if this functionality is ever implemented in one of Seurat's functions (see issue #274 for more details). I would suggest you calculate ERCC abundances before creating your Seurat object, on the raw count matrix. You can then add these values to the [email protected] slot under any name while creating the object with a subset of your raw data (i.e. without the ERCC genes), as follows:

# Calculate ERCC abundances on the raw counts before creating a Seurat object
ERCC.WT.index <- grep(pattern = "^ERCC-", x = rownames(count.data), value = FALSE) # Select row indices and not ERCC names 
percent.ERCC.WT <- Matrix::colSums(count.data[ERCC.WT.index, ])/Matrix::colSums(count.data)

# Remove ERCC from count.data
count.data <- count.data[-ERCC.WT.index, ]

# Create Seurat object, and add percent.ERCC.WT to [email protected] in the percent.ERCC column
data.WT <- CreateSeuratObject(raw.data = count.data, meta.data = data.frame(percent.ERCC = percent.ERCC.WT))

Hope this helps!

Best,
Leon

Hello Leon,
Thank you very much for your suggestion. Just to clarify: I guess that the intention to add ERCC abundances in metadata is to perform metrics with this data and being able to filter based on ERCC, but they are eliminated from count matrix so that they are not going to be used for further analysis, right?

Best,
Marina

Exactly!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tmccra2 picture tmccra2  路  3Comments

kysbbubbu picture kysbbubbu  路  3Comments

bio-la picture bio-la  路  3Comments

akhst7 picture akhst7  路  3Comments

rajasreemenon picture rajasreemenon  路  3Comments