Seurat: genename to ensembl id

Created on 23 Jul 2019  路  3Comments  路  Source: satijalab/seurat

Thanks your good software, I have a question about it.
How to extract ensembl id from seurat object. Because I found that one gene corresponds to two ID. Such as ATXN7 corresponds to ENSG00000285258 and ENSG00000163635, but there are ATXN7 and ATXN7.1 in seurat object, so I don't know which id corresponds to ATXN7 and which id corresponds to ATXN7.1.

Most helpful comment

Hi, the functions Read10X has an argument gene.column that can be used to change between using the gene name and the ensembl ID when reading in 10x Genomics datasets. You can also use the use.names parameter when using Read10X_h5.

Alternatively, you can work out which gene ID in the Seurat object corresponds to which ensembl ID by loading the features.tsv file for your dataset and running make.unique on the gene names. For example:

> genes <- read.table("/home/stuartt/data/10x_scrna/pbmc10k_v3/filtered_feature_bc_matrix/features.tsv", stringsAsFactors = FALSE)
> genes[genes$V2 == "ATXN7", ]
                  V1    V2   V3         V4
6094 ENSG00000285258 ATXN7 Gene Expression
6095 ENSG00000163635 ATXN7 Gene Expression
> genes$V2 <- make.unique(genes$V2)
> head(genes)
               V1          V2   V3         V4
1 ENSG00000243485 MIR1302-2HG Gene Expression
2 ENSG00000237613     FAM138A Gene Expression
3 ENSG00000186092       OR4F5 Gene Expression
4 ENSG00000238009  AL627309.1 Gene Expression
5 ENSG00000239945  AL627309.3 Gene Expression
6 ENSG00000239906  AL627309.2 Gene Expression
> genes[genes$V2 == "ATXN7.1", ]
                  V1      V2   V3         V4
6095 ENSG00000163635 ATXN7.1 Gene Expression

All 3 comments

Hi, the functions Read10X has an argument gene.column that can be used to change between using the gene name and the ensembl ID when reading in 10x Genomics datasets. You can also use the use.names parameter when using Read10X_h5.

Alternatively, you can work out which gene ID in the Seurat object corresponds to which ensembl ID by loading the features.tsv file for your dataset and running make.unique on the gene names. For example:

> genes <- read.table("/home/stuartt/data/10x_scrna/pbmc10k_v3/filtered_feature_bc_matrix/features.tsv", stringsAsFactors = FALSE)
> genes[genes$V2 == "ATXN7", ]
                  V1    V2   V3         V4
6094 ENSG00000285258 ATXN7 Gene Expression
6095 ENSG00000163635 ATXN7 Gene Expression
> genes$V2 <- make.unique(genes$V2)
> head(genes)
               V1          V2   V3         V4
1 ENSG00000243485 MIR1302-2HG Gene Expression
2 ENSG00000237613     FAM138A Gene Expression
3 ENSG00000186092       OR4F5 Gene Expression
4 ENSG00000238009  AL627309.1 Gene Expression
5 ENSG00000239945  AL627309.3 Gene Expression
6 ENSG00000239906  AL627309.2 Gene Expression
> genes[genes$V2 == "ATXN7.1", ]
                  V1      V2   V3         V4
6095 ENSG00000163635 ATXN7.1 Gene Expression

@timoast I have a similar question, I have a merged seurat object and I want to convert the gene names to ensembl gene ids, however since the gene names are modified I can't simply use biomaRt for example as it does not recognize some of the gene names. Do you have any suggestion? Thank you

I follow the question of @kaizen89 . I have the same issue

Was this page helpful?
0 / 5 - 0 ratings