Scout: Persistent information between reruns

Created on 4 Sep 2020 · 10Comments · Source: Clinical-Genomics/scout

In case of re-run, it is either customer request, resequencing, bioinformatic workflow update, or just good old re-upload of data.

To begin with, I think it'd be useful to preserve some information from previous upload instead of completely wiping it. Here is the list of wishlist I can think about:

Are pinned/causative variants still there? Do they still have the same:
1. AF
2. AD/DP
3. GT/GQ
What is the aggregated difference between past case and this one? i.e. compare previous run to this run:
1. Total VAF
2. Total DP or AD
3. Tally of pinned/causative variants. Are they same with regards to point 1 above.
4. Difference in difference of values for VAF, DP, AD, or some other metrics.

(I'll more to this if I can think about more)

Overall I think it'll be a great value to do this. I'm planning this in accordance with Vogue package, where validating each of our TGA or WES runs for cancer samples.

This is probably not so fun for WGS run and RD in general, there are too many variants, score, etc to take care of. But maybe it is more fun :-)

Yay or nay?

enhancement Future ideas Intermediate

Source

hassanfa

All 10 comments

Good idea! Thinking about how to create it..
Doesn't have to be complicated, for instance instead of wiping out the old variants, if the variant was found previously, insert into the new ones a key/value

previous runs: [
 {
   run_date: date,
   causative: True or False, 
   pinned: True or False,
   dismissed: [reasons],
   AD/DP,
   AF, 
   GT/GQ,
   ..

 },
..
]

northwestwitch on 4 Sep 2020

👍1

I'm creating another issue linked to this, but that's not because of re-runs but to track remissions and relapses. #2075

hassanfa on 4 Sep 2020

This is also something that pops up once in a while - see e.g. https://github.com/Clinical-Genomics/scout/issues/381.

Not saying we shouldn't do it, but it is worth considering that if you are "only" thinking about the actual variants, and not about user comments etc this may be quite a bit easier to do outside Scout - i e the operations involved are essentially subtractions very similar to tumor-normal pairs. Time series is slightly different, but some of the same applies - except perhaps for the visualisation of that which has indeed changed.

dnil on 7 Sep 2020

I see. makes sense. But if it is possible to do scout diff between re-run/re-uploads, that'd be great. Feel free to close this if it is totally irrelevant, but consider me interested in this.

hassanfa on 17 Sep 2020

Noted! Help us by describing use cases that would not be easier solved by diff on the original VCFs! My personal interests in the differences between runs are usually more along the developer lines - ie how much better is the new version doing. That I would ascribe to functionalities like varg / mutacc. Occasionally, as a clinical user, I'm in the situation that we have rerun a batch of cases say due to a bug. In this case, showing diffs can be highly relevant. We have traditionally solved such issues by introducing a specific filter option and/or check-box for the affected cases, rather than building a full blown diff engine.

dnil on 17 Sep 2020

You bring a good point on mutacc 👍🏽 , I think that's bioinformatics workflow to compare between runs. And possibly outside scope of Scout.

I'm more inclined towards clinical use cases. Having a diff engine will benefit relapse/remission cases for cancer. If we had an option to export as VCF, we could've just moved this into bioinformatic workflow so workflow takes care of it. So only compare pinned vars from a case to the new case.

I can also see a potential ease of use for ranked variants in rare-disease: a spearman correlation for their ranks could trigger a warning that variants in a case are significantly different that old run. Is this useful? So instead of a big diff engine, just spearman correlation of ranks. Of course as long as spearman correlation assumptions hold for rankscore, bla bla....

hassanfa on 17 Sep 2020

Yeah, so closing as discussed in other thread, but just one more random thought that does not involve building a new nifty comparison tool.

I still believe most of the heavy lifting in comparisons should be done on the bulk samples, and by a different, more alignment and other annotation aware tool than the visualisation one. Exporting small subsets for comparisons is bound to end up in a place where we do not want to be.

Consider the user who is just insisting to compare their old variants for a remission sample, but that ignores the possibility of a new clone with a new, clearly pathogenic and actionable variant that pops up.

That said, if one were to upload say a remission case for an individual, showing e.g. diffs between known tumor and new remission sample and/or known normal tissue. One could still on the ~intermediate to somewhat high effort level allow shadowing events from specified old case, much as we do for research instances of a clinical case. In this scenario, the desired diff operation is still safely handled by a more alignment aware program, but the user input in Scout can be reused for the "new" case.

dnil on 17 Sep 2020

I agree with all of your random thoughts.
agreed

We're still far away from properly formulating these types of feature due to lack of data or real world cases.

hassanfa on 17 Sep 2020

😄1

I'll tentatively wrap this to:

[ ] allow shadowing of events to other specified cases, not only research

That would potentially also be rather handy for another case with say cousins or a sib with a different disorder - or perhaps a small cohort. Ok?

dnil on 17 Sep 2020

👍1

Sounds good :-). Agreed.gif again!

Small cohort can be multiple tracks as well! Nice. 👍

hassanfa on 17 Sep 2020

🎉1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

"Symbol mis-match" when uploading new gene panel to Scout

keyvanelhami · 5Comments

Comments return to anchor?

dnil · 3Comments

Saved filters fail to load and throw an error.

hassanfa · 3Comments

transcript listed twice with different cDNA change

ielvers · 3Comments

Scout IGV bug, "Unparsable bed record: chr14 23299091"

hassanfa · 4Comments