Say I am maintaining some data using dvc, and at some point decide I want to have a metric showing some data statistics (i.e. track how many positive samples I have). So I create a pipeline that computes this metric. How do I back-fill it to previous commits? The goal is to plot a graph showing this metric at different stages of the project.
Specifically: If the commit history is A->B->master, I know I can checkout data-commit 'A' and run the pipeline, but I won't be able to save the metric output for commit 'A' in the context of commit 'A', right? At most I will be able to commit it in a new commit (A') whose parent is 'A'. It would have been better if I could have committed it to 'A' directly. Why? Because A' is not an ancestor of 'master', so it's not naturally included in the development of my data.
More context: https://discuss.dvc.org/t/fill-back-metrics/441/3
metrics diff should accept multiple revisions (like plots diff). Having that, we can solve this with Git:How do I back-fill it to previous commits?
Since the metrics file didn't exist in previous commits, one way is by using git cherry-pick the commit that introduces the metrics-generating stage (let's call it commit C) into all the previous commits of interest. Then run dvc metrics diff A' B' C
More detailed explanation in https://discuss.dvc.org/t/fill-back-metrics/441/4
Did you try that @jonilaserson? (But it's only supported byplots diffat the moment.)
This is limited to a relatively small number of commits, though.
A more advanced solution (not exclusive with the previous one), would be for DVC to actually back-fill the metrics to previous commits by itself, by trying to run the metrics-generating stage (in commit C) on top of all the previous commits indicated to the command — by the way, what would the syntax look? Should it accept revision ranges (as mentioned in https://github.com/iterative/dvc/issues/1691#issuecomment-662053357)?
But it would only support workspace versions where all the dependencies for this metrics-generating stage exist, possibly just skipping the commits where that's not the case.
Thanks, I'll give it a shot.
Most helpful comment
Thanks, I'll give it a shot.