Running:
#!/bin/bash
rm -rf repo
mkdir repo
pushd repo
git init --quiet
dvc init --quiet
echo "{'metric':0.95}" >> metric.json
git add -A
git commit -am "initial"
git tag v1
echo "{'metric':0.96}" >> metric.json
git add -A
git commit -am "second"
git tag v2
dvc metrics diff v1 v2 -v --targets metric.json
yields no result. metrics diff/show
should be able to handle non-metrics
targets that can be interpreted by our parsers.
For the record: same for plots
dvc params diff
should follow the same logic. But first, --targets
needs to be introduced for params.
Metrics/plots/params difference for Git-files is a useful scenario even outside of the data-files or pipeline context. It feels like it is a separate functionality in DVC that needs to be fully supported.
For example, it might be used in CI/CD to compare metrics/plots/params with master or previous commits in a repo when DVC is not used to generate reports like this - https://github.com/iterative/cml_dvc_case/pull/4
Additional scenarios to support:
--targets
specified).--targets
specified). So, a dummy\phony stage might be needed.@dmpetrov I see here following scenarios (lets take plots
as example command):
DvcRepo, file that is not marked as plot, but we are able to dvc plots show
and dvc plots diff
- in this case providing targets
is required - because we need to know what we are trying to parse.
DvcRepo, some file is marked as plot, but not using dvc run --plots
(but, for example dvc plots add {file}
). Later we are able to dvc plots show/diff
without targets
, and dvc is able to find it by itself.
GitRepo, we are able to dvc plots show/diff
- in this case there is no way to mark file as plot beforehand, and that's why we need to provide targets
.
Not a repo, we are able to dvc plots show
- targets
required
Is there any prioritization between those scenarios?
@pared All looks good!
Re (4). It is not a high-priority scenario. But it would be amazing to support it if not a big implementation overhead. dvc plots show
makes sense for just a file outside of any repo. The diff
s - probably not (we can imaging a scenario with two files diff, but cmd signature needs to be changed. it is probably do not worth it).
Re (1)&(2) - it should support data files even not plotmetrics ones.
An important thing about this issue - we need to pay attention to the UI part. All the commands should be unified including params
one.
So, starting this issue I made a mistake and wanted to start with moving and unifying collection for plots/metrics/diff
into the SCM. It will be probably required at some point, thought its definitely not a good thing to start with as it is a big change.
So my plan is as follows:
metrics
accept non metric files - https://github.com/iterative/dvc/pull/4590plots
accept non plots filesmetrics/plots/diff
plot show
for no repo at allOk, now only left to double check the UI to make sure that this looks good. (e.g. erros on bad formatting, missing targets and so on).
We still need to support no repo show
scenario. Added to checklist (before it was joined with GitRepo
).
@pared dvc plots show
? It should be already there.
@efiop I get "." is not a git repository
upon running following script:
#!/bin/bash
set -ex
rm -rf wspace
mkdir wspace
pushd wspace
mkdir repo
pushd repo
echo '[{"val":1},{"val":3}]' >> plot.json
dvc plots show plot.json
@pared Ah, I misunderstood. Without git repo too, yeah, I don't think that one was planned for. But good to have for consistency :+1: Thanks!
UI checks:
#!/bin/bash
set -ex
rm -rf wspace
mkdir wspace
pushd wspace
mkdir repo
pushd repo
git init --quiet
dvc init --quiet
git add -A
git commit -am "init"
echo '[{"m":1}, {"m":2]' >> data.json # datapoint not closed on m=2
dvc plots show data.json
#dvc metrics show data.json
Results:
dvc plots show data.json
:
ERROR: unexpected error - Expecting ',' delimiter: line 1 column 17 (char 16)
dvc metrics show data.json
:
ERROR: failed to show metrics - Could not parse metrics files. Use
-voption to see more details.
#!/bin/bash
rm -rf wspace
mkdir wspace
pushd wspace
mkdir repo
pushd repo
set -ex
git init --quiet
dvc init --quiet
git add -A
git commit -am "init"
dvc plots show data.json
#dvc metrics show data.json
Results:
dvc plots show data.json
:
WARNING: 'data.json' was not found at: 'workspace'.
ERROR: Could not find 'data.json'.
dvc metrics show data.json
WARNING: 'data.json' was not found at: 'None'.
ERROR: failed to show metrics - no metrics files in this repository. Use `-m/-M` options for `dvc run` to mark stage outputs as metrics.
EDIT
WARNING: 'data.json' was not found at: 'None'.
seems like a bug, that should be fixed in #4789
#!/bin/bash
rm -rf wspace
mkdir wspace
pushd wspace
mkdir repo
pushd repo
set -ex
git init --quiet
dvc init --quiet
git add -A
git commit -am "init"
git checkout -b new_branch
echo '{"m":2}' > metric.json
echo '[{"p":1},{"p":2}]' > plot.json
dvc metrics diff master --targets metric.json
#dvc plots diff master --targets plot.json
Results:
dvc metrics diff master --targets metric.json
:
WARNING: 'metric.json' was not found at: 'None'.
Path Metric Old New Change
metric.json m — 2 —
Seems like we have a bug here (None
instead of master
)
dvc plots diff master --targets plot.json
WARNING: 'plot.json' was not found at: 'master'.
file:///home/prd/development/playground/4446/checking_behavior/wspace/repo/plots.html
#!/bin/bash
set -ex
rm -rf wspace
mkdir wspace
pushd wspace
mkdir repo
pushd repo
git init --quiet
dvc init --quiet
dvc plots show
#dvc metrics show
Results:
dvc plots show
:
ERROR: no plots in this repository. Use
--plots/--plots-no-cacheoptions for
dvc runto mark stage outputs as plots.
dvc metris show
:
ERROR: failed to show metrics - no metrics files in this repository. Use
-m/-Moptions for
dvc runto mark stage outputs as metrics.
Additional scenario: https://github.com/iterative/dvc/issues/4788#issuecomment-717047501
Summary on current state of the issue, and what still needs to be done here:
metrics
accept non-metrics file - doesn't matter if we are in DVC repo, Git repo, or just a folder - targets
requiredplots
accept non-plots file - doesn't matter if we are in DVC repo, Git repo, or just a folder - targets
requiredmetrics
plots
and params
between branches is unified - we will get the same log message if given file is not found at particular revision, no matter command usedTODO:
4 scenarios of unified error handling:
(They should be fixed, leaving this list to double check)
--targets
for params
commandparams
accept non-params file - not matter DVC, GIT, or no repo case
Most helpful comment
Metrics/plots/params difference for Git-files is a useful scenario even outside of the data-files or pipeline context. It feels like it is a separate functionality in DVC that needs to be fully supported.
For example, it might be used in CI/CD to compare metrics/plots/params with master or previous commits in a repo when DVC is not used to generate reports like this - https://github.com/iterative/cml_dvc_case/pull/4
Additional scenarios to support:
--targets
specified).--targets
specified). So, a dummy\phony stage might be needed.