Hi all,
I have ERCC controls in my UMI dataset, and I used them for cell filtering and also for scaling data (apart from nUMI and mito genes). In first place, am I doing well or is it not necessary? If they were not useful for this, I do not really know the function of these control ERCCs.
On the other hand, I proceeded with downstream analysis with all genes (including ERCCs) but they are appearing as cluster markers in some populations and I do not want this to happen. How can I remove them from the Seurat object after doing the scaling? If they were not useful for filtering and scaling, I know how to remove them from the initial matrix, but if they were... Can I use SubsetData function, giving as susbset.name argument a vector with ERCC names? It is giving me an error and I think SubsetData is for subsetting cells, right?:
#This is the script I used to calculate initial % of ERCC
ERCC.WT <- grep(pattern = "^ERCC-", x = rownames(x = data.WT@data), value = TRUE)
percent.ERCC.WT <- Matrix::colSums([email protected][ERCC.WT, ])/Matrix::colSums([email protected])
data.WT <- AddMetaData(object = data.WT, metadata = percent.ERCC.WT, col.name = "percent.ERCC")
#And after normalized and scaled data, I tried:
data.WT <- NormalizeData(data.WT)
data.WT <- ScaleData(data.WT, vars.to.regress = c("nUMI", "percent.mito", "percent.ERCC"))
data.WT.subset <- SubsetData(data.WT, subset.name = ERCC.WT)
Error in WhichCells(object = object, ident = ident.use, ident.remove = ident.remove, :
subset.name must be a single parameter
Another doubt has now come to me: Should I also remove mito genes in case they label some clusters too? I understand their expression constitute signals of low quality cells (broken), but I would not want to manipulate data.
Many thanks in advance!
Marina
Hello Marina,
Excluding gene counts from the Seurat object cannot be performed, and I am not sure if this functionality is ever implemented in one of Seurat's functions (see issue #274 for more details). I would suggest you calculate ERCC abundances before creating your Seurat object, on the raw count matrix. You can then add these values to the [email protected] slot under any name while creating the object with a subset of your raw data (i.e. without the ERCC genes), as follows:
# Calculate ERCC abundances on the raw counts before creating a Seurat object
ERCC.WT.index <- grep(pattern = "^ERCC-", x = rownames(count.data), value = FALSE) # Select row indices and not ERCC names
percent.ERCC.WT <- Matrix::colSums(count.data[ERCC.WT.index, ])/Matrix::colSums(count.data)
# Remove ERCC from count.data
count.data <- count.data[-ERCC.WT.index, ]
# Create Seurat object, and add percent.ERCC.WT to [email protected] in the percent.ERCC column
data.WT <- CreateSeuratObject(raw.data = count.data, meta.data = data.frame(percent.ERCC = percent.ERCC.WT))
Hope this helps!
Best,
Leon
Hello Leon,
Thank you very much for your suggestion. Just to clarify: I guess that the intention to add ERCC abundances in metadata is to perform metrics with this data and being able to filter based on ERCC, but they are eliminated from count matrix so that they are not going to be used for further analysis, right?
Best,
Marina
Exactly!
Most helpful comment
Hello Marina,
Excluding gene counts from the Seurat object cannot be performed, and I am not sure if this functionality is ever implemented in one of
Seurat's functions (see issue #274 for more details). I would suggest you calculate ERCC abundances before creating yourSeuratobject, on the raw count matrix. You can then add these values to the[email protected]slot under any name while creating the object with a subset of your raw data (i.e. without the ERCC genes), as follows:Hope this helps!
Best,
Leon