Hello,
my current ML project runs a DVC pipeline which trains a model for several different segments of the data, and produces a metrics file for each one.
I want to be able to track how these metrics evolve over time; however, the segments of data that are evaluated in each run depend on some parameters on the first step of the pipeline, and so the number of output metrics files changes every time I run the pipeline.
I have tried several approaches:
-m metrics to my DVC run command, where metrics/ is a folder where I store all my metrics files. This worked, but when calling dvc metrics show -a -R it simply showed the DVC file for that folder, instead of parsing the individual metrics.-o, and then adding each file as a metric using a script loop: I get the following error ERROR: failed to add metric file 'metrics/metric-1.json' - unable to find DVC-file with output 'metrics/metric-1.jsonI am out of ideas! Is this functionality supported, or do I need to have a fixed number of metrics files for each pipeline?
Related https://github.com/iterative/dvc/issues/1682 .
Also discord context: https://discordapp.com/channels/485586884165107732/485596304961962003/656877089345110017
For the record:
adding -m metrics to my DVC run command, where metrics/ is a folder where I store all my metrics files. This worked, but when calling dvc metrics show -a -R it simply showed the DVC file for that folder, instead of parsing the individual metrics.
Looks like a bug, we should handle this more gracefully.
Using a wildcard -m metrics/* - this simply doesn't work.
This one doesn't work, because shell does the evaluation of the wildcard and dvc run doesn't accept multiples for -m. Might work if you evaluate first, and then add -m prefix for each file and add that to the command.
Setting the metrics directory as an output with -o, and then adding each file as a metric using a script loop: I get the following error ERROR: failed to add metric file 'metrics/metric-1.json' - unable to find DVC-file with output 'metrics/metric-1.json
Again bug, which is probably caused by Repo.find_outs_by_path not being able to understand nested paths (working on a solution for it as a part of my dir granularity ticket).
In terms of the implementation for the directories, we will need to teach metrics show walk through git and cached directories https://github.com/iterative/dvc/blob/0.77.3/dvc/repo/metrics/show.py#L234 .
Thanks for all the info!
I'd like to implement it but I'm not sure I correctly understand how it should be implemented. As far as I can see it should work like this:
dvc run command should accept directories for -m/-M flags (BTW it seems it already accepts now).dvc metrics should stay intact.dvc metrics show whenever encountered out is metric and also is a directory then it's content should be recursively added to the list of metrics.Is it desired behavior?
As for implementation I think that _read_metrics should check if metric is dir somewhere in the cycle https://github.com/iterative/dvc/blob/bb50fb86648da92f6fde502a7c6aefc4514514c0/dvc/repo/metrics/show.py#L59 and recursively walk through it's content via tree.walk.
Unfortunately there is one issue with tree.walk. If user runs dvc metrics show -a and a dir with metric files is not presented in current branch then tree.walk would return an empty list. Should RepoTree.walk be modified to walk through all the branches and tags?
@nik123 sounds good! Yes, show -a should show the workspace too. Need to double check that, it might be an issue with brancher.
For the record: the directories support for metrics as described here https://github.com/iterative/dvc/issues/2973#issuecomment-636037450 will not be implemented (at least for now). For details see #3930