Dvc: plots/metrics: accept any viable target

Created on 21 Aug 2020  Â·  17Comments  Â·  Source: iterative/dvc

Running:

#!/bin/bash

rm -rf repo
mkdir repo

pushd repo

git init --quiet
dvc init --quiet


echo "{'metric':0.95}" >> metric.json
git add -A
git commit -am "initial"
git tag v1


echo "{'metric':0.96}" >> metric.json
git add -A
git commit -am "second"
git tag v2

dvc metrics diff v1 v2 -v --targets metric.json

yields no result. metrics diff/show should be able to handle non-metrics targets that can be interpreted by our parsers.

feature request p1-important product ui

Most helpful comment

Metrics/plots/params difference for Git-files is a useful scenario even outside of the data-files or pipeline context. It feels like it is a separate functionality in DVC that needs to be fully supported.

For example, it might be used in CI/CD to compare metrics/plots/params with master or previous commits in a repo when DVC is not used to generate reports like this - https://github.com/iterative/cml_dvc_case/pull/4

Additional scenarios to support:

  1. metrics/plots/params diff for not-DVC repos (with --targets specified).
  2. metrics/plots/params diff for DVC repos without any pipelines (without --targets specified). So, a dummy\phony stage might be needed.

All 17 comments

For the record: same for plots

dvc params diff should follow the same logic. But first, --targets needs to be introduced for params.

Metrics/plots/params difference for Git-files is a useful scenario even outside of the data-files or pipeline context. It feels like it is a separate functionality in DVC that needs to be fully supported.

For example, it might be used in CI/CD to compare metrics/plots/params with master or previous commits in a repo when DVC is not used to generate reports like this - https://github.com/iterative/cml_dvc_case/pull/4

Additional scenarios to support:

  1. metrics/plots/params diff for not-DVC repos (with --targets specified).
  2. metrics/plots/params diff for DVC repos without any pipelines (without --targets specified). So, a dummy\phony stage might be needed.

@dmpetrov I see here following scenarios (lets take plots as example command):

  1. DvcRepo, file that is not marked as plot, but we are able to dvc plots show and dvc plots diff - in this case providing targets is required - because we need to know what we are trying to parse.

  2. DvcRepo, some file is marked as plot, but not using dvc run --plots (but, for example dvc plots add {file}). Later we are able to dvc plots show/diff without targets, and dvc is able to find it by itself.

  3. GitRepo, we are able to dvc plots show/diff - in this case there is no way to mark file as plot beforehand, and that's why we need to provide targets.

  4. Not a repo, we are able to dvc plots show - targets required

Is there any prioritization between those scenarios?

@pared All looks good!

Re (4). It is not a high-priority scenario. But it would be amazing to support it if not a big implementation overhead. dvc plots show makes sense for just a file outside of any repo. The diffs - probably not (we can imaging a scenario with two files diff, but cmd signature needs to be changed. it is probably do not worth it).

Re (1)&(2) - it should support data files even not plotmetrics ones.

An important thing about this issue - we need to pay attention to the UI part. All the commands should be unified including params one.

So, starting this issue I made a mistake and wanted to start with moving and unifying collection for plots/metrics/diff into the SCM. It will be probably required at some point, thought its definitely not a good thing to start with as it is a big change.

So my plan is as follows:

  • [x] make metrics accept non metric files - https://github.com/iterative/dvc/pull/4590
  • [x] make plots accept non plots files
  • [x] unify collection of targets for metrics/plots/diff
  • [x] support purely GitRepos
  • [x] support plot show for no repo at all

Ok, now only left to double check the UI to make sure that this looks good. (e.g. erros on bad formatting, missing targets and so on).

We still need to support no repo show scenario. Added to checklist (before it was joined with GitRepo).

@pared dvc plots show? It should be already there.

@efiop I get "." is not a git repository upon running following script:

#!/bin/bash

set -ex

rm -rf wspace
mkdir wspace
pushd wspace

mkdir repo
pushd repo

echo '[{"val":1},{"val":3}]' >> plot.json
dvc plots show plot.json

@pared Ah, I misunderstood. Without git repo too, yeah, I don't think that one was planned for. But good to have for consistency :+1: Thanks!

UI checks:

  1. bad formatting:
#!/bin/bash

set -ex

rm -rf wspace
mkdir wspace
pushd wspace

mkdir repo
pushd repo

git init --quiet
dvc init --quiet

git add -A
git commit -am "init"
echo '[{"m":1}, {"m":2]' >> data.json # datapoint not closed on m=2

dvc plots show data.json
#dvc metrics show data.json

Results:

dvc plots show data.json:
ERROR: unexpected error - Expecting ',' delimiter: line 1 column 17 (char 16)

dvc metrics show data.json:
ERROR: failed to show metrics - Could not parse metrics files. Use-voption to see more details.

  1. Missing target
#!/bin/bash

rm -rf wspace
mkdir wspace
pushd wspace

mkdir repo
pushd repo

set -ex

git init --quiet
dvc init --quiet
git add -A 
git commit -am "init" 

dvc plots show data.json
#dvc metrics show data.json

Results:

dvc plots show data.json:

WARNING: 'data.json' was not found at: 'workspace'.
ERROR: Could not find 'data.json'.

dvc metrics show data.json

WARNING: 'data.json' was not found at: 'None'.
ERROR: failed to show metrics - no metrics files in this repository. Use `-m/-M` options for `dvc run` to mark stage outputs as  metrics.

EDIT
WARNING: 'data.json' was not found at: 'None'.
seems like a bug, that should be fixed in #4789

  1. Missing target on other revision:
#!/bin/bash

rm -rf wspace
mkdir wspace
pushd wspace

mkdir repo
pushd repo

set -ex

git init --quiet
dvc init --quiet
git add -A 
git commit -am "init" 

git checkout -b new_branch
echo '{"m":2}' > metric.json
echo '[{"p":1},{"p":2}]' > plot.json

dvc metrics diff master --targets metric.json
#dvc plots diff master --targets plot.json

Results:
dvc metrics diff master --targets metric.json:

WARNING: 'metric.json' was not found at: 'None'.
Path         Metric    Old    New    Change
metric.json  m         —      2      —

Seems like we have a bug here (None instead of master)

dvc plots diff master --targets plot.json

WARNING: 'plot.json' was not found at: 'master'.
file:///home/prd/development/playground/4446/checking_behavior/wspace/repo/plots.html
  1. No metrics file in repo at all:
#!/bin/bash

set -ex

rm -rf wspace
mkdir wspace
pushd wspace

mkdir repo
pushd repo

git init --quiet
dvc init --quiet

dvc plots show
#dvc metrics show

Results:

dvc plots show:
ERROR: no plots in this repository. Use--plots/--plots-no-cacheoptions fordvc runto mark stage outputs as plots.

dvc metris show:
ERROR: failed to show metrics - no metrics files in this repository. Use-m/-Moptions fordvc runto mark stage outputs as metrics.

Summary on current state of the issue, and what still needs to be done here:

  • [x] metrics accept non-metrics file - doesn't matter if we are in DVC repo, Git repo, or just a folder - targets required
  • [x] plots accept non-plots file - doesn't matter if we are in DVC repo, Git repo, or just a folder - targets required
  • [x] collection of metrics plots and params between branches is unified - we will get the same log message if given file is not found at particular revision, no matter command used

TODO:
4 scenarios of unified error handling:

  • [ ] [bad file format](https://github.com/iterative/dvc/issues/4446#issuecomment-716360074)
  • [ ] [missing target](https://github.com/iterative/dvc/issues/4446#issuecomment-716450832)
  • [ ] [missing target on other revision](https://github.com/iterative/dvc/issues/4446#issuecomment-716450832)
  • [ ] [no metric/plots in repo at all](https://github.com/iterative/dvc/issues/4446#issuecomment-716450832)

(They should be fixed, leaving this list to double check)

  • [ ] introduce --targets for params command
  • [ ] params accept non-params file - not matter DVC, GIT, or no repo case
Was this page helpful?
0 / 5 - 0 ratings