Scout: OMIM-AUTO mising crucial genes

Created on 2 Oct 2020  路  10Comments  路  Source: Clinical-Genomics/scout

I cannot make sense of how OMIM-AUTO is created.

With the latest OMIM update (https://www.omim.org/statistics/geneMap) There should be 4333 genes with matching phenotype and mutation. OMIM-AUTO gene panel created through scout update omim contain 4030 genes. Regardless of how I try to filter the OMIM-files from the api I keep coming to the same conclusion, there should be 4333 genes.

For instance, RAD51D is missing in OMIM-AUTO. It clearly has a matching phenotype on OMIM. It does also exist in the scout database, for both hg19 and hg38.

Please help a confused soul

question

All 10 comments

Hi! We haven't updated genes, diseases and OMIM-AUTO panel on our production server yet, but I'll give it a try on my local instance!

Yes, I also end up with an OMIM panel of 4030 genes. I'll check what happens in the case of RAD51D!

:+1: Hm, RAD51D qualifies rather as a susceptibility gene, so I can imagine there may be some funky setting somewhere on it not clearly indicating it as a morbid one. At one point we didn't include them at all, and especially not the pharmacophenos, but now I think the filter is simply for "established" and "provisional". Let's see!

Do you want to take over this one @dnil ?

Well, not really: I would rather you keep using my key - thats more profitable.. 馃槣

But, I think this may be exactly that issue: RAD51D is indeed classified "only" as a susceptibility gene, which we have chosen not to include in the OMIM-AUTO panel. Anyone is naturally welcome to include it on their own panels - it is loaded as a gene, and a phenotype.

The ones we load onto OMIM-AUTO are the established and provisional ("?"-entries), not the "{}" risk/susceptibility or "[]" non disease entires of OMIM.

We have this status map that effectively gets checked when genes are added to OMIM-AUTO:

OMIM_STATUS_MAP = {"[": "nondisease", "{": "susceptibility", "?": "provisional"}
...
phenotype_status = OMIM_STATUS_MAP.get(phenotype_info[0], "established")
...
def get_omim_panel_genes(genemap2_lines, mim2gene_lines, alias_genes):
...
status_to_add = set(["established", "provisional"])
...
if not phenotype_info.get("status") in status_to_add:

Might that explain your count difference @ViktorHy?

I think it just might:

Screenshot 2020-10-02 at 15 58 26
Screenshot 2020-10-02 at 15 58 04

While there is bound to be a lot of overlap due to N-to-1 1-to-M and N-to-M connections between disease and gene, the ballpark of some 4300 genes with established+susceptibility on the top and some 500 susceptibility genes may well fit, right?

In particular RAD51D does not seem to have an established disease causation logged, and will so not be included, whereas e.g. BRCA1 has one for Fanconi - while it is also "just" {susceptibility} with regard to BRC - and is so included. I hope this brings some balm to the wounds on the confused soul, and feel very free to suggest e.g. a toggle that includes susceptibility genes - or make a new OMIM-S-AUTO or such that include them, if you want them?

Thanks! This is indeed what separates them. I need to check with our clinicians to see if OMIM-S-AUTO would be of interest.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dnil picture dnil  路  3Comments

1ctw picture 1ctw  路  5Comments

andreaswallberg picture andreaswallberg  路  4Comments

ViktorHy picture ViktorHy  路  4Comments

KickiLagerstedt picture KickiLagerstedt  路  5Comments