dvc status command should probably have --filter option to select only specific subset of all possible states (new, deleted, missing, modified, not in cache):
# Show only missing files
# Error out if something is not up to date
$dvc status -q -c --filter=m
Default value for filter option should be "list all states"
It's continuation of #4383 and #4436
I want to create a workflow file in my GitHub repo which checks if latest dvc data has been pushed to remote without actually loading the data via fetch or pull. The first step was described #4383 - if data is not pushed to remote, I get missing state. However, if data is pushed to remote, I get deleted state and status still exits with error. Therefore there should be a way to filter only states I am interested in. Something similar to --diff-filter in git diff.
P.S.: Since --diff-filter option is a part of diff command in git, then filter option in dvc should be also implemented via dvc diff.
Not sure how UI should look like because output format differs for local workspace and remote storage. For use case described in original post --filter might be applied for -r/-c flags only:
dvc status -c -r --filter NMD
where "NMD" values are:
N - include "new".M - include "missing".D - include "deleted".The "NMD" characters might be combined in any order. The lowercase characters ("nmd") might be used to exclude values (i.e. status -c n means "show everything except new").
If --filter should also support workpace statuses, then possible values are:
C - changed checksumA - always changed.N - not in cacheM - modifiedD - deletedX - not in cache (not sure about this one).U - update available.Once again, characters might be combined in any order and lowercase characters might be used to exclude values.
There is on issue with proposed values: same characters have different meaning for local and remote. e.g. N means "new" for remote and "not in cache" for workspace. The possible solutions I see:
filter for remote/workspace only.BTW the status message format is different in workspace and remote modes. In local mode message is more hierarchical and shows information about stages which is not the case for remote mode. Here is little example to show difference in remote in local modes:
$ dvc status -c
deleted: foo
deleted: foobar
deleted: bar
$ dvc status
foo.dvc:
changed outs:
not in cache: foo
join_foobar:
changed deps:
deleted: bar
deleted: foo
changed outs:
not in cache: foobar
bar.dvc:
changed outs:
not in cache: bar
Not sure it's a big deal because I've it's only bothered me when I've started to think about --fitler option.
@nik123 To be honest, this feature seems pretty synthetic and only useful for automation. Feels like we are inventing some special filter syntax for no good reason. Have you considered --show-json instead?
@efiop, thanks, I'll try show-json out. It'll probably solve my use case.
Hi @nik123 :) Have you had a chance to try it out?
@efiop , yes, I did. Running dvc status -c --show-json and parsing the output afterwards works just fine for me. I suppose issue might be closed.