Having tried to create a Seurat object with CreateSeuratObject I found that the identity class for each cell affected by underscores (_) in the library names.
Turns out those classes are set in the object@ident slot by the following code:
It is possible to play around with the names.field or names.delim parameters, but if the names have variable underscores it won't give me the right grouping:
mycolnames <- c("lib-1_ACTG","lib-2_ATCC","lib-2_redone_ATTT")
factor(x = unlist(x = lapply(
X = mycolnames,
FUN = ExtractField,
field = 1,
delim = "_"
)))
[1] lib-1 lib-2 lib-2
Levels: lib-1 lib-2
Maybe that's by design, but since I use underscores a lot in my files names it would be convenient have an option to ignore them or so.
A possible workaround is to overwrite the grouping using a regular expression that separates only by the last underscore in the names:
object@ident <- factor(stringr::str_replace(colnames(object@data),"_[^_]+$",""))
Just wanted to share this in case someone has the same issue.
Hi @seb-mueller,
You may probably already know what I have written, but looking at your code prompted me to mention it, in case it is useful. To overwrite the identity of your cells, I would however suggest you first create a column in your [email protected] slot storing all the identities of your cells, and then pass that column as input to the id argument of the Seurat::SetAllIdent() function. This allows you to maintain those identities somewhere for later use. Otherwise, if the object@ident slot gets overwritten (e.g. through Seurat::FindClusters()), you will simply loose this information.
# Create Seurat object
>object <- CreateSeuratObject(raw.data = data)
# Randomly sample 'a', 'b' and 'c' as new identities for the cells
>new.ident <- factor(x = sample(x = letters[1:3], size = ncol(object@data), replace = TRUE))
# Replace 'object@ident' with the new identities
>object@ident <- new.ident
>head(object@ident)
[1] b a b c c c
Levels: a b c
# Check identities of cells in '[email protected]'
>head(x = [email protected], 1) # 'orig.ident' stores the original identities of the cells
nGene nUMI orig.ident
LFHT2_ROW01_01 8873 171329.9 LFHT2
As you can see, overwriting object@ident will not add this information to [email protected].
# Store 'new.ident' in '[email protected]$my.ident'
>[email protected]$my.ident <- new.ident
>head(x = [email protected], 1)
nGene nUMI orig.ident my.ident
LFHT2_ROW01_01 8873 171329.9 LFHT2 a
# Set the identities of the cells to the levels stored in '[email protected]$my.ident'
>object <- SetAllIdent(object = object, id = "my.ident")
>head(object@ident)
LFHT2_ROW01_01 LFHT2_ROW01_02 LFHT2_ROW01_03 LFHT2_ROW01_04
a a b b
LFHT2_ROW01_05 LFHT2_ROW01_06
b c
Levels: a b c
Another way of doing it would be to use the Seurat::StashIdent() function after having overwritten the object@ident slot, the way you did it.
# Replace 'object@ident' with new identities
>object@ident <- new.ident
# Stash the cell identities to the 'my.ident' column in '[email protected]'
>object <- StashIdent(object = object, save.name = "my.ident")
>head(x = [email protected], 1)
nGene nUMI orig.ident my.ident
LFHT2_ROW01_01 8873 171329.9 LFHT2 a
Best,
Leon
Thanks a lot for that @leonfodoulian !
In fact I wasn't aware of SetAllIdent, so this is exactly what I was looking for.
I've wrapped up the infos to a minimal workflow in case anyone else runs into a similar issue.
library(dplyr)
colnames(mymatrix)
# [1] "lib-1_ACTG" "lib-2_ATCC" "lib-2_redone_ATTT"
metaData <- data.frame(cellNames = colnames(mymatrix)) %>%
mutate(samples = factor(str_replace(cellNames,"_[^_]*$",""))) %>%
mutate(barcode = factor(str_replace(cellNames,".+_","")))
rownames(metaData) <- metaData$cellNames
print(metaData)
# cellNames samples barcode
# lib-1_ACTG lib-1_ACTG lib-1 ACTG
# lib-2_ATCC lib-2_ATCC lib-2 ATCC
# lib-2_redone_ATTT lib-2_redone_ATTT lib-2_redone ATTT
object <- CreateSeuratObject(raw.data = mymatrix, meta.data = metaData)
object <- SetAllIdent(object = object, id = "samples")
[email protected]$orig.ident <- [email protected]$samples # orig.ident has to be overwritten for some reason as well
Best,
Seb
Most helpful comment
Hi @seb-mueller,
You may probably already know what I have written, but looking at your code prompted me to mention it, in case it is useful. To overwrite the identity of your cells, I would however suggest you first create a column in your
[email protected]slot storing all the identities of your cells, and then pass that column as input to theidargument of theSeurat::SetAllIdent()function. This allows you to maintain those identities somewhere for later use. Otherwise, if theobject@identslot gets overwritten (e.g. throughSeurat::FindClusters()), you will simply loose this information.As you can see, overwriting
object@identwill not add this information to[email protected].Another way of doing it would be to use the
Seurat::StashIdent()function after having overwritten theobject@identslot, the way you did it.Best,
Leon