In case of re-run, it is either customer request, resequencing, bioinformatic workflow update, or just good old re-upload of data.
To begin with, I think it'd be useful to preserve some information from previous upload instead of completely wiping it. Here is the list of wishlist I can think about:
Are pinned/causative variants still there? Do they still have the same:
What is the aggregated difference between past case and this one? i.e. compare previous run to this run:
(I'll more to this if I can think about more)
Overall I think it'll be a great value to do this. I'm planning this in accordance with Vogue package, where validating each of our TGA or WES runs for cancer samples.
This is probably not so fun for WGS run and RD in general, there are too many variants, score, etc to take care of. But maybe it is more fun :-)
Yay or nay?
Good idea! Thinking about how to create it..
Doesn't have to be complicated, for instance instead of wiping out the old variants, if the variant was found previously, insert into the new ones a key/value
previous runs: [
{
run_date: date,
causative: True or False,
pinned: True or False,
dismissed: [reasons],
AD/DP,
AF,
GT/GQ,
..
},
..
]
I'm creating another issue linked to this, but that's not because of re-runs but to track remissions and relapses. #2075
This is also something that pops up once in a while - see e.g. https://github.com/Clinical-Genomics/scout/issues/381.
Not saying we shouldn't do it, but it is worth considering that if you are "only" thinking about the actual variants, and not about user comments etc this may be quite a bit easier to do outside Scout - i e the operations involved are essentially subtractions very similar to tumor-normal pairs. Time series is slightly different, but some of the same applies - except perhaps for the visualisation of that which has indeed changed.
I see. makes sense. But if it is possible to do scout diff between re-run/re-uploads, that'd be great. Feel free to close this if it is totally irrelevant, but consider me interested in this.
Noted! Help us by describing use cases that would not be easier solved by diff on the original VCFs! My personal interests in the differences between runs are usually more along the developer lines - ie how much better is the new version doing. That I would ascribe to functionalities like varg / mutacc. Occasionally, as a clinical user, I'm in the situation that we have rerun a batch of cases say due to a bug. In this case, showing diffs can be highly relevant. We have traditionally solved such issues by introducing a specific filter option and/or check-box for the affected cases, rather than building a full blown diff engine.
You bring a good point on mutacc 馃憤馃徑 , I think that's bioinformatics workflow to compare between runs. And possibly outside scope of Scout.
I'm more inclined towards clinical use cases. Having a diff engine will benefit relapse/remission cases for cancer. If we had an option to export as VCF, we could've just moved this into bioinformatic workflow so workflow takes care of it. So only compare pinned vars from a case to the new case.
I can also see a potential ease of use for ranked variants in rare-disease: a spearman correlation for their ranks could trigger a warning that variants in a case are significantly different that old run. Is this useful? So instead of a big diff engine, just spearman correlation of ranks. Of course as long as spearman correlation assumptions hold for rankscore, bla bla....
Yeah, so closing as discussed in other thread, but just one more random thought that does not involve building a new nifty comparison tool.
I still believe most of the heavy lifting in comparisons should be done on the bulk samples, and by a different, more alignment and other annotation aware tool than the visualisation one. Exporting small subsets for comparisons is bound to end up in a place where we do not want to be.
Consider the user who is just insisting to compare their old variants for a remission sample, but that ignores the possibility of a new clone with a new, clearly pathogenic and actionable variant that pops up.
That said, if one were to upload say a remission case for an individual, showing e.g. diffs between known tumor and new remission sample and/or known normal tissue. One could still on the ~intermediate to somewhat high effort level allow shadowing events from specified old case, much as we do for research instances of a clinical case. In this scenario, the desired diff operation is still safely handled by a more alignment aware program, but the user input in Scout can be reused for the "new" case.
I agree with all of your random thoughts.

We're still far away from properly formulating these types of feature due to lack of data or real world cases.
I'll tentatively wrap this to:
That would potentially also be rather handy for another case with say cousins or a sib with a different disorder - or perhaps a small cohort. Ok?
Sounds good :-). Agreed.gif again!
Small cohort can be multiple tracks as well! Nice. 馃憤