Dvc: Adding multiple metrics files at once

Created on 18 Dec 2019 · 7Comments · Source: iterative/dvc

Hello,

my current ML project runs a DVC pipeline which trains a model for several different segments of the data, and produces a metrics file for each one.
I want to be able to track how these metrics evolve over time; however, the segments of data that are evaluated in each run depend on some parameters on the first step of the pipeline, and so the number of output metrics files changes every time I run the pipeline.

I have tried several approaches:

adding -m metrics to my DVC run command, where metrics/ is a folder where I store all my metrics files. This worked, but when calling dvc metrics show -a -R it simply showed the DVC file for that folder, instead of parsing the individual metrics.
Using a wildcard -m metrics/* - this simply doesn't work.
Setting the metrics directory as an output with -o, and then adding each file as a metric using a script loop: I get the following error ERROR: failed to add metric file 'metrics/metric-1.json' - unable to find DVC-file with output 'metrics/metric-1.json

I am out of ideas! Is this functionality supported, or do I need to have a fixed number of metrics files for each pipeline?

feature request help wanted p2-medium

Source

FredericoCoelhoNunes

👍2

All 7 comments

Also discord context: https://discordapp.com/channels/485586884165107732/485596304961962003/656877089345110017

efiop on 18 Dec 2019

For the record:

adding -m metrics to my DVC run command, where metrics/ is a folder where I store all my metrics files. This worked, but when calling dvc metrics show -a -R it simply showed the DVC file for that folder, instead of parsing the individual metrics.

Looks like a bug, we should handle this more gracefully.

Using a wildcard -m metrics/* - this simply doesn't work.

This one doesn't work, because shell does the evaluation of the wildcard and dvc run doesn't accept multiples for -m. Might work if you evaluate first, and then add -m prefix for each file and add that to the command.

Setting the metrics directory as an output with -o, and then adding each file as a metric using a script loop: I get the following error ERROR: failed to add metric file 'metrics/metric-1.json' - unable to find DVC-file with output 'metrics/metric-1.json

Again bug, which is probably caused by Repo.find_outs_by_path not being able to understand nested paths (working on a solution for it as a part of my dir granularity ticket).

efiop on 18 Dec 2019

In terms of the implementation for the directories, we will need to teach metrics show walk through git and cached directories https://github.com/iterative/dvc/blob/0.77.3/dvc/repo/metrics/show.py#L234 .

efiop on 18 Dec 2019

Thanks for all the info!

FredericoCoelhoNunes on 18 Dec 2019

👍1

I'd like to implement it but I'm not sure I correctly understand how it should be implemented. As far as I can see it should work like this:

dvc run command should accept directories for -m/-M flags (BTW it seems it already accepts now).
Parameters for dvc metrics should stay intact.
During execution of dvc metrics show whenever encountered out is metric and also is a directory then it's content should be recursively added to the list of metrics.

Is it desired behavior?

As for implementation I think that _read_metrics should check if metric is dir somewhere in the cycle https://github.com/iterative/dvc/blob/bb50fb86648da92f6fde502a7c6aefc4514514c0/dvc/repo/metrics/show.py#L59 and recursively walk through it's content via tree.walk.

Unfortunately there is one issue with tree.walk. If user runs dvc metrics show -a and a dir with metric files is not presented in current branch then tree.walk would return an empty list. Should RepoTree.walk be modified to walk through all the branches and tags?

nik123 on 29 May 2020

👍1

@nik123 sounds good! Yes, show -a should show the workspace too. Need to double check that, it might be an issue with brancher.

efiop on 29 May 2020

For the record: the directories support for metrics as described here https://github.com/iterative/dvc/issues/2973#issuecomment-636037450 will not be implemented (at least for now). For details see #3930

nik123 on 5 Jun 2020

❤1

Was this page helpful?

0 / 5 - 0 ratings