Dvc: Back-fill metrics

Created on 15 Jul 2020  Â·  4Comments  Â·  Source: iterative/dvc

Say I am maintaining some data using dvc, and at some point decide I want to have a metric showing some data statistics (i.e. track how many positive samples I have). So I create a pipeline that computes this metric. How do I back-fill it to previous commits? The goal is to plot a graph showing this metric at different stages of the project.

Specifically: If the commit history is A->B->master, I know I can checkout data-commit 'A' and run the pipeline, but I won't be able to save the metric output for commit 'A' in the context of commit 'A', right? At most I will be able to commit it in a new commit (A') whose parent is 'A'. It would have been better if I could have committed it to 'A' directly. Why? Because A' is not an ancestor of 'master', so it's not naturally included in the development of my data.

feature request p2-medium research

Most helpful comment

Thanks, I'll give it a shot.

All 4 comments

  • [ ] I think for starters metrics diff should accept multiple revisions (like plots diff). Having that, we can solve this with Git:

How do I back-fill it to previous commits?

Since the metrics file didn't exist in previous commits, one way is by using git cherry-pick the commit that introduces the metrics-generating stage (let's call it commit C) into all the previous commits of interest. Then run dvc metrics diff A' B' C

More detailed explanation in https://discuss.dvc.org/t/fill-back-metrics/441/4
Did you try that @jonilaserson? (But it's only supported by plots diff at the moment.)

This is limited to a relatively small number of commits, though.

A more advanced solution (not exclusive with the previous one), would be for DVC to actually back-fill the metrics to previous commits by itself, by trying to run the metrics-generating stage (in commit C) on top of all the previous commits indicated to the command — by the way, what would the syntax look? Should it accept revision ranges (as mentioned in https://github.com/iterative/dvc/issues/1691#issuecomment-662053357)?

But it would only support workspace versions where all the dependencies for this metrics-generating stage exist, possibly just skipping the commits where that's not the case.

Thanks, I'll give it a shot.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kskyten picture kskyten  Â·  44Comments

dmpetrov picture dmpetrov  Â·  35Comments

jorgeorpinel picture jorgeorpinel  Â·  45Comments

shcheklein picture shcheklein  Â·  36Comments

dmpetrov picture dmpetrov  Â·  64Comments