genmod developers (and clinical customers) would prefer if we always load all variants annotated potentially pathogenic for scrutiny. This frees up for development of the rank model - and makes sure no founder variants, reduced penetrance or otherwise odd pathogenic variants are overlooked on load.
Please include here which of the clinvar terms that should always be loaded.
There are some to choose from...
CLNSIG=Likely_pathogenic,_drug_response => 7
CLNSIG=other => 1701
CLNSIG=other,_risk_factor => 1
CLNSIG=Affects,_risk_factor => 2
CLNSIG=association_not_found => 2
CLNSIG=Affects,_association => 1
CLNSIG=Pathogenic,_Affects => 10
CLNSIG=Likely_benign,_drug_response => 1
CLNSIG=Conflicting_interpretations_of_pathogenicity,_drug_response => 1
CLNSIG=not_provided => 9122
CLNSIG=drug_response,_risk_factor => 1
CLNSIG=protective => 33
CLNSIG=Benign,_risk_factor => 3
CLNSIG=Conflicting_interpretations_of_pathogenicity,_risk_factor => 7
CLNSIG=Affects => 120
CLNSIG=risk_factor => 449
CLNSIG=Uncertain_significance,_association => 1
CLNSIG=drug_response,_protective,_risk_factor => 1
CLNSIG=Pathogenic/Likely_pathogenic,_other => 8
CLNSIG=association => 193
CLNSIG=Pathogenic,_association,_protective => 1
CLNSIG=Pathogenic,_association => 3
CLNSIG=Likely_benign => 84414
CLNSIG=Benign/Likely_benign,_other => 10
CLNSIG=Pathogenic,_drug_response => 18
CLNSIG=Uncertain_significance => 217073
CLNSIG=Likely_pathogenic,_other => 2
CLNSIG=Uncertain_significance,_drug_response => 11
CLNSIG=Benign,_other => 7
CLNSIG=Likely_benign,_risk_factor => 1
CLNSIG=Pathogenic/Likely_pathogenic,_drug_response => 1
CLNSIG=Pathogenic,_risk_factor => 29
CLNSIG=Benign/Likely_benign => 12006
CLNSIG=Conflicting_interpretations_of_pathogenicity,_Affects => 1
CLNSIG=Benign/Likely_benign,_risk_factor => 1
CLNSIG=Conflicting_interpretations_of_pathogenicity => 19550
CLNSIG=association,_risk_factor => 3
CLNSIG=Likely_benign,_other => 18
CLNSIG=Conflicting_interpretations_of_pathogenicity,_protective => 1
CLNSIG=Pathogenic/Likely_pathogenic,_risk_factor => 6
CLNSIG=Pathogenic,_other => 101
CLNSIG=Likely_pathogenic,_risk_factor => 15
CLNSIG=drug_response,_other => 1
CLNSIG=association,_protective => 1
CLNSIG=Benign,_association => 1
CLNSIG=Conflicting_interpretations_of_pathogenicity,_other => 6
CLNSIG=Pathogenic/Likely_pathogenic => 4544
CLNSIG=Likely_pathogenic => 27517
CLNSIG=Uncertain_significance,_other => 4
CLNSIG=Pathogenic => 63118
CLNSIG=Benign,_drug_response => 3
CLNSIG=protective,_risk_factor => 4
CLNSIG=drug_response => 234
CLNSIG=Uncertain_significance,_risk_factor => 3
CLNSIG=Pathogenic,_protective => 6
CLNSIG=Benign => 37377
All except the terms: benign, likely benign or combinations thereof
So according to that suggestion we should load variants with the following terms(for example):
CLNSIG=other => 1701
CLNSIG=other,_risk_factor => 1
CLNSIG=Affects,_risk_factor => 2
CLNSIG=association_not_found => 2
CLNSIG=Affects,_association => 1
CLNSIG=Conflicting_interpretations_of_pathogenicity,_drug_response => 1
CLNSIG=not_provided => 9122
CLNSIG=drug_response,_risk_factor => 1
CLNSIG=protective => 33
CLNSIG=Conflicting_interpretations_of_pathogenicity,_risk_factor => 7
CLNSIG=Affects => 120
CLNSIG=risk_factor => 449
CLNSIG=Uncertain_significance,_association => 1
CLNSIG=association => 193
CLNSIG=Conflicting_interpretations_of_pathogenicity,_Affects => 1
CLNSIG=Uncertain_significance,_other => 4
CLNSIG=protective,_risk_factor => 4
CLNSIG=drug_response => 234
CLNSIG=Uncertain_significance,_risk_factor => 3
Are you sure about that? Feels like we will flood the database with variants then...
These mostly do seem insignificant! What are the “other” and “not_provided”?
Exactly, I think we need to think this through a little bit and take it term by term
Is it possible to get calculate for a case how many we have of the different categories?
I'm not sure yet but I think that the numbers could be the key. Look at them instead of the strings
I don't know @vwirta but in any case I dom't think it will be informative with variants with the clinical significance as described above. It does not add to the interpretation
I agree @moonso, I do not think we should load all of these categories.
We need to judge these one by one and then document what we decided.
You may well be right. My somewhat eager to get out through the door judgement was that there were so few in each of the scrap categories that we could load them just to be safe and deal with the categories later. And send a pedagogical note to the clinvar maintainers with these numbers to let them know that they are making life difficult for developers / clinicians using their resource.
I would vote for using these:
{
'Likely_pathogenic',
'Pathogenic',
'Pathogenic/Likely_pathogenic'
}
Explanation of terms here. Most of these would not be interesting to us
{ 'Likely_pathogenic', 'Pathogenic', 'Pathogenic/Likely_pathogenic' }
I would probably add:
Pathogenic,_other
Likely_pathogenic,_other
Pathogenic/Likely_pathogenic,_other
Just to be on the safe side
VUS may be VUS since they are awaiting just one more patient with the phenotype to call it. And what about the conflicting ones? Some are just old calls that we now understand to be benign (or are at least deemed unknown again) but some again go the other way. Curation of those would be great, but not our immediate task. I don’t see it hurting anyone if we load them, but possibly subtly so by not doing it? Although filter redesign looms ahead...
I would probably agree with Henrik and just load everything except Benign, Likely benign and maybe "not_provided".
VUSes have potentially relevant information in the ClinVar submissions.
"Conflicting_interpretations_of_pathogenicity" are common in for instance BRCA since the BRCA community does ClinVar submissions slightly different.
And things like "drug response" and "risk_factor" could potentially be interesting for non-RD applications, and I don't think Scout should alianate those use cases? Also these terms are fairly rare and would only add around a hundred variants.
However, I suppose this all depends on the rationale for this change. If the purpose is to load relevant ClinVar variants so the clinicians can go through all of them quickly, one would probably only want to load Pathogenic and Likely pathogenic.
Haha ok I'm working on this one and need a decision. I would really not like to add ALL VUSes to the database regardless of rank score. I can not see how one would work with a VUS with a low rank score, in what way would that add to the analysis? There will be tons of variants to consider before even encountering those. I think they will just end up in the catacombs of the database...
Did we decide to remove clinvar from the ranking model? Because if we keep it then we can afford to always load (this PR) just a small amount of variants (the pathogenic and likely pathogenic ones), leaving out all the rest, which would end up ranked anyway, so if it has to be uploaded then it will be.
Why do not we start with a whitelist of:
{
'Likely_pathogenic',
'Pathogenic',
'Pathogenic/Likely_pathogenic'
'Conflicting_interpretations_of_pathogenicity'
}
but make sure that entries like "Pathogenic,_other" are split on "," and therefore always loaded.
If we in the future find that we have need of another term e.g. "drug response" - let´s add it one by one.
We can certainly keep clinvar keys in the rank model, and get the rest into Scout with a sane rank score
I like that way if thinking
Most helpful comment
Why do not we start with a whitelist of:
but make sure that entries like "Pathogenic,_other" are split on "," and therefore always loaded.
If we in the future find that we have need of another term e.g. "drug response" - let´s add it one by one.
We can certainly keep clinvar keys in the rank model, and get the rest into Scout with a sane rank score