Hi!)
Nowtime DVC doesn't cache metrics. It means, that user needs to store it in git history. But DVC doesn't suggest to add to git.
Maybe it's possible to do something like:
1) After pipeline stop add to output git add <metrics/file/path>
. For example:
dvc run -d train.py -M metrics.json -o last_checkpoint.zip python train.py
and output should be:
To track the changes with git run:
git add .gitignore last_checkpoint.zip.dvc metrics.json
2) Add -m
option to dvc run
that will cache metrics (alternatively to -o
and -O
options)
I think 1 way is most perfect, cause not destroy paradigm - "source code in git, outputs in DVC".
Thanks!
@mroutis I'm not sure about "good first issue" though. It requires rebuilding a logic for dvc metrics show --all-bracnches
significantly. @efiop should have more context on this.
Oh, @shcheklein , I was thinking only about expanding the message to include git add metrics.json
, maybe I rushed a little bit to label it as good first issue
, indeed; What do you think the approach should be?
I would say, if changing the message (append the file name to git add
) solves enough of the problem for @toodef, then let's just do this, because it's 10x simpler then changing DVC to actually cache the file.
Implementing -m
is really easy, we used to have it in the past and all logic is still there. For example, you could workaround it by specifying -o metrics.json
in dvc run
and then explicitly marking as metric with dvc metrics add metrics.json
. :)
Why was the reason to remove -m
, @efiop?
@efiop does it handle metrics show --all-branches
in this scenario?
@efiop ok, i will try. But when i do dvc run -m metrics.json -o metrics.json
- dvc metrics show -a
show error that metrics.json mark as output from stage.
Is it different than -o metrics.json
+ metrics add metrica.json
?
@mroutis The reason was our discussion with @dmpetrov , where we decided that it is a safer choice to just support -M
, so that metrics are stored in git and not in dvc cache. But, as it turned out, sometimes users want that, so we need to bring it back just to save hustle :slightly_smiling_face:
@shcheklein It runs dvc checkout
when checking out another branch.
@toodef
Ah, right. Ok, then try adding metrics: true
manually to the entry for metrics.json
in that dvc file. That should work :) I.e.
cmd: echo OLOLO > metrics
md5: cf910dde1bf5bdeb94723ddbfcc74ecd
outs:
- cache: true
md5: cbccc79ac9213325a623c38851b01c88
metric: true
path: metrics
@efiop it's good to know (re dvc checkout
), thanks! It we decide to switch to using git API instead of (git checkout + dvc checkout) will we have a problem then? Doing dvc checkout
can take some time, especially if cache is in copy mode, right?
@efiop When using ls-files, we will be able to access cache files directly, instead of relying on dvc checkout
to put them into the workspace.
@efiop good point, makes total sense. So, what are the action points here? Should we introduce -m
then?
@shcheklein We should do both: modify message and add -m
. Totally agree with original post from @toodef :slightly_smiling_face:
Now metrics really caching! Thank u for very quickly implementation!!!)
But dvc metrics show -a
show same values:
hnm:
metrics.json: {"train": {"jaccard": 0.6354309320449829, "dice": 0.7804347276687622, "loss": 0.9194902181625366}, "validation": {"jaccard": 0.6705400347709656, "dice": 0.8039528727531433, "loss": 0.9115020632743835}}
master:
metrics.json: {"train": {"jaccard": 0.6354309320449829, "dice": 0.7804347276687622, "loss": 0.9194902181625366}, "validation": {"jaccard": 0.6705400347709656, "dice": 0.8039528727531433, "loss": 0.9115020632743835}}
after git checkout hnm && dvc checkout
output is:
hnm:
metrics.json: {"train": {"jaccard": 0.6649578809738159, "dice": 0.8037408590316772, "loss": 0.8932506442070007}, "validation": {"jaccard": 0.6277576684951782, "dice": 0.772462785243988, "loss": 0.9959300756454468}}
master:
metrics.json: {"train": {"jaccard": 0.6649578809738159, "dice": 0.8037408590316772, "loss": 0.8932506442070007}, "validation": {"jaccard": 0.6277576684951782, "dice": 0.772462785243988, "loss": 0.9959300756454468}}
OS: Ubuntu 18.04.2 LTS
DVC: 0.26.0
Git: 2.17.1
Reopened for now, to investigate. 馃憖 into it now.
Can confirm that dvc metrics show -a
returns the same result for all branches. Workaround for now dvc metrics show -a metrics.json
. We will the command asap.
Now it's work very well)! Thanks a lot!
Most helpful comment
@shcheklein We should do both: modify message and add
-m
. Totally agree with original post from @toodef :slightly_smiling_face: