Scout: Always load ClinVar likely pathogenic / pathogenic variants, regardless of rank score

Created on 13 Sep 2019 · 19Comments · Source: Clinical-Genomics/scout

genmod developers (and clinical customers) would prefer if we always load all variants annotated potentially pathogenic for scrutiny. This frees up for development of the rank model - and makes sure no founder variants, reduced penetrance or otherwise odd pathogenic variants are overlooked on load.

Intermediate QualityOfLife

Source

dnil

Most helpful comment

Why do not we start with a whitelist of:

{
'Likely_pathogenic',
'Pathogenic',
'Pathogenic/Likely_pathogenic'
'Conflicting_interpretations_of_pathogenicity'
}

but make sure that entries like "Pathogenic,_other" are split on "," and therefore always loaded.
If we in the future find that we have need of another term e.g. "drug response" - let´s add it one by one.

We can certainly keep clinvar keys in the rank model, and get the rest into Scout with a sane rank score

henrikstranneheim on 8 Oct 2019

👍3

All 19 comments

Please include here which of the clinvar terms that should always be loaded.
There are some to choose from...

CLNSIG=Likely_pathogenic,_drug_response => 7
CLNSIG=other => 1701
CLNSIG=other,_risk_factor => 1
CLNSIG=Affects,_risk_factor => 2
CLNSIG=association_not_found => 2
CLNSIG=Affects,_association => 1
CLNSIG=Pathogenic,_Affects => 10
CLNSIG=Likely_benign,_drug_response => 1
CLNSIG=Conflicting_interpretations_of_pathogenicity,_drug_response => 1
CLNSIG=not_provided => 9122
CLNSIG=drug_response,_risk_factor => 1
CLNSIG=protective => 33
CLNSIG=Benign,_risk_factor => 3
CLNSIG=Conflicting_interpretations_of_pathogenicity,_risk_factor => 7
CLNSIG=Affects => 120
CLNSIG=risk_factor => 449
CLNSIG=Uncertain_significance,_association => 1
CLNSIG=drug_response,_protective,_risk_factor => 1
CLNSIG=Pathogenic/Likely_pathogenic,_other => 8
CLNSIG=association => 193
CLNSIG=Pathogenic,_association,_protective => 1
CLNSIG=Pathogenic,_association => 3
CLNSIG=Likely_benign => 84414
CLNSIG=Benign/Likely_benign,_other => 10
CLNSIG=Pathogenic,_drug_response => 18
CLNSIG=Uncertain_significance => 217073
CLNSIG=Likely_pathogenic,_other => 2
CLNSIG=Uncertain_significance,_drug_response => 11
CLNSIG=Benign,_other => 7
CLNSIG=Likely_benign,_risk_factor => 1
CLNSIG=Pathogenic/Likely_pathogenic,_drug_response => 1
CLNSIG=Pathogenic,_risk_factor => 29
CLNSIG=Benign/Likely_benign => 12006
CLNSIG=Conflicting_interpretations_of_pathogenicity,_Affects => 1
CLNSIG=Benign/Likely_benign,_risk_factor => 1
CLNSIG=Conflicting_interpretations_of_pathogenicity => 19550
CLNSIG=association,_risk_factor => 3
CLNSIG=Likely_benign,_other => 18
CLNSIG=Conflicting_interpretations_of_pathogenicity,_protective => 1
CLNSIG=Pathogenic/Likely_pathogenic,_risk_factor => 6
CLNSIG=Pathogenic,_other => 101
CLNSIG=Likely_pathogenic,_risk_factor => 15
CLNSIG=drug_response,_other => 1
CLNSIG=association,_protective => 1
CLNSIG=Benign,_association => 1
CLNSIG=Conflicting_interpretations_of_pathogenicity,_other => 6
CLNSIG=Pathogenic/Likely_pathogenic => 4544
CLNSIG=Likely_pathogenic => 27517
CLNSIG=Uncertain_significance,_other => 4
CLNSIG=Pathogenic => 63118
CLNSIG=Benign,_drug_response => 3
CLNSIG=protective,_risk_factor => 4
CLNSIG=drug_response => 234
CLNSIG=Uncertain_significance,_risk_factor => 3
CLNSIG=Pathogenic,_protective => 6
CLNSIG=Benign => 37377

moonso on 7 Oct 2019

All except the terms: benign, likely benign or combinations thereof

henrikstranneheim on 7 Oct 2019

So according to that suggestion we should load variants with the following terms(for example):

CLNSIG=other => 1701
CLNSIG=other,_risk_factor => 1
CLNSIG=Affects,_risk_factor => 2
CLNSIG=association_not_found => 2
CLNSIG=Affects,_association => 1
CLNSIG=Conflicting_interpretations_of_pathogenicity,_drug_response => 1
CLNSIG=not_provided => 9122
CLNSIG=drug_response,_risk_factor => 1
CLNSIG=protective => 33
CLNSIG=Conflicting_interpretations_of_pathogenicity,_risk_factor => 7
CLNSIG=Affects => 120
CLNSIG=risk_factor => 449
CLNSIG=Uncertain_significance,_association => 1
CLNSIG=association => 193
CLNSIG=Conflicting_interpretations_of_pathogenicity,_Affects => 1
CLNSIG=Uncertain_significance,_other => 4
CLNSIG=protective,_risk_factor => 4
CLNSIG=drug_response => 234
CLNSIG=Uncertain_significance,_risk_factor => 3

Are you sure about that? Feels like we will flood the database with variants then...

moonso on 7 Oct 2019

These mostly do seem insignificant! What are the “other” and “not_provided”?

dnil on 7 Oct 2019

Exactly, I think we need to think this through a little bit and take it term by term

moonso on 7 Oct 2019

Is it possible to get calculate for a case how many we have of the different categories?

vwirta on 7 Oct 2019

I'm not sure yet but I think that the numbers could be the key. Look at them instead of the strings

moonso on 7 Oct 2019

I don't know @vwirta but in any case I dom't think it will be informative with variants with the clinical significance as described above. It does not add to the interpretation

moonso on 7 Oct 2019

I agree @moonso, I do not think we should load all of these categories.
We need to judge these one by one and then document what we decided.

vwirta on 7 Oct 2019

You may well be right. My somewhat eager to get out through the door judgement was that there were so few in each of the scrap categories that we could load them just to be safe and deal with the categories later. And send a pedagogical note to the clinvar maintainers with these numbers to let them know that they are making life difficult for developers / clinicians using their resource.

dnil on 7 Oct 2019

I would vote for using these:

{
 'Likely_pathogenic',
 'Pathogenic',
 'Pathogenic/Likely_pathogenic'
}

moonso on 7 Oct 2019

Explanation of terms here. Most of these would not be interesting to us

moonso on 7 Oct 2019

{
 'Likely_pathogenic',
 'Pathogenic',
 'Pathogenic/Likely_pathogenic'
}

I would probably add:

Pathogenic,_other
Likely_pathogenic,_other
Pathogenic/Likely_pathogenic,_other

Just to be on the safe side

northwestwitch on 8 Oct 2019

VUS may be VUS since they are awaiting just one more patient with the phenotype to call it. And what about the conflicting ones? Some are just old calls that we now understand to be benign (or are at least deemed unknown again) but some again go the other way. Curation of those would be great, but not our immediate task. I don’t see it hurting anyone if we load them, but possibly subtly so by not doing it? Although filter redesign looms ahead...

dnil on 8 Oct 2019

I would probably agree with Henrik and just load everything except Benign, Likely benign and maybe "not_provided".

VUSes have potentially relevant information in the ClinVar submissions.

"Conflicting_interpretations_of_pathogenicity" are common in for instance BRCA since the BRCA community does ClinVar submissions slightly different.

And things like "drug response" and "risk_factor" could potentially be interesting for non-RD applications, and I don't think Scout should alianate those use cases? Also these terms are fairly rare and would only add around a hundred variants.

However, I suppose this all depends on the rationale for this change. If the purpose is to load relevant ClinVar variants so the clinicians can go through all of them quickly, one would probably only want to load Pathogenic and Likely pathogenic.

bjhall on 8 Oct 2019

Haha ok I'm working on this one and need a decision. I would really not like to add ALL VUSes to the database regardless of rank score. I can not see how one would work with a VUS with a low rank score, in what way would that add to the analysis? There will be tons of variants to consider before even encountering those. I think they will just end up in the catacombs of the database...

moonso on 8 Oct 2019

Did we decide to remove clinvar from the ranking model? Because if we keep it then we can afford to always load (this PR) just a small amount of variants (the pathogenic and likely pathogenic ones), leaving out all the rest, which would end up ranked anyway, so if it has to be uploaded then it will be.

northwestwitch on 8 Oct 2019

Why do not we start with a whitelist of:

{
'Likely_pathogenic',
'Pathogenic',
'Pathogenic/Likely_pathogenic'
'Conflicting_interpretations_of_pathogenicity'
}

We can certainly keep clinvar keys in the rank model, and get the rest into Scout with a sane rank score

henrikstranneheim on 8 Oct 2019

👍3

I like that way if thinking

moonso on 8 Oct 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

"Symbol mis-match" when uploading new gene panel to Scout

keyvanelhami · 5Comments

Mitochondria contig name hg19 vs hg38

ViktorHy · 4Comments

"in COSMIC and in Clinvar" filter in variant list view.

hassanfa · 3Comments

Deprecate load_scout?

moonso · 4Comments

Deprecation alert!

northwestwitch · 3Comments