https://scout.scilifelab.se/cust003/15041/sv/variants/a74106f85d532d2a7aa97225c1fdbab1
no OMIM info in Scout

the gene is in OMIM
https://www.omim.org/entry/606810
Thank you! I confirm it is missing on this variant and a couple of others in the same case. Several others ok however, both in this case and other recent cases. Note that this is a fairly old case with a recent rerun, so reupload issues can be of interest.
I'm trying to figure out at which stage and why the omim info gets lost for some genes.
For instance we know that correct OMIM data gests parsed for genes such as CDC45 (https://scout.scilifelab.se/genes/1739) but not for PRODH (https://scout.scilifelab.se/genes/9453)
Following the gene parsing process looks like this:
scout.load_hgnc_genes calls the link genes function:
https://github.com/Clinical-Genomics/scout/blob/63c0193539e19cad75293859256d927813d255e5/scout/load/hgnc_gene.py#L132
That is responsible for annotating the omim phenotypes, specifically here:
https://github.com/Clinical-Genomics/scout/blob/63c0193539e19cad75293859256d927813d255e5/scout/utils/link.py#L226
I've been printing debug messages and looks like the gene contains a "phenotype" key with the correct values in this latter function, but then when everything goes back to thescout.load_hgnc_genes and
it loops thru the progressbar the info is somehow lost and doesn't reach the build_hgnc_gene (line 154):
I've been spending quite some time on this and now omim locked me out for having done too many requests. I could use some help here!
@moonso when you have some time could you take a look at this? This weird behavior of PRODH gene is still a mistery to me
Meh, weird! Mystery is the right word! So far:
PRODH|ENSG00000100033|Transcript|ENST00000357068There might be something weird with the OMIM entries, esp some confusion with PRODH2; let me check that.
One worrying detail is that the gene has two phenotype entries in OMIM, one of which is the one we are looking for and the other an association, a 麓{pheno}麓 which were not correctly parsed in some waaaay old version. The latter is correctly ditched and not loaded to the db, but I can't shake the feeling that may also be connected.
No, can't reproduce the PRODH2/PRODH OMIM number thing. Leaving that.
Just for the record, it is indeed a thing:



We need to sharpen the routines that deal with mapping disorder to hgnc_id to gene_symbol when old aliases point to new genes.
When the dictionary of aliases is created, it contains these items:
'HSPOX2': {'true': None, 'ids': {9453}},
'PIG6': {'true': None, 'ids': {14524, 9453}},
'PRODH1': {'true': None, 'ids': {9453}},
'PRODH': {'true': 9453, 'ids': {9453}},
'PRODH2': {'true': None, 'ids': {9453, 17325}, 'true_id': 17325},
'TP53I6': {'true': None, 'ids': {9453}},
Shouldn't
https://github.com/Clinical-Genomics/scout/blob/696104178d6aaa5649b4ab1f6f176eaad3450cbc/scout/utils/link.py#L51
assign 'true' rather than 'true_id' then?
(testing that)
馃槄

Ahhh, the relief!
PR upcoming..
Most helpful comment
馃槄

Ahhh, the relief!