Dvc: status: filter option

Created on 10 Sep 2020  路  6Comments  路  Source: iterative/dvc

dvc status command should probably have --filter option to select only specific subset of all possible states (new, deleted, missing, modified, not in cache):

# Show only missing files
# Error out if something is not up to date
$dvc status -q -c --filter=m

Default value for filter option should be "list all states"

Use case

It's continuation of #4383 and #4436

I want to create a workflow file in my GitHub repo which checks if latest dvc data has been pushed to remote without actually loading the data via fetch or pull. The first step was described #4383 - if data is not pushed to remote, I get missing state. However, if data is pushed to remote, I get deleted state and status still exits with error. Therefore there should be a way to filter only states I am interested in. Something similar to --diff-filter in git diff.

P.S.: Since --diff-filter option is a part of diff command in git, then filter option in dvc should be also implemented via dvc diff.

awaiting response feature request help wanted p2-medium

All 6 comments

Not sure how UI should look like because output format differs for local workspace and remote storage. For use case described in original post --filter might be applied for -r/-c flags only:

dvc status -c -r --filter NMD

where "NMD" values are:

  1. N - include "new".
  2. M - include "missing".
  3. D - include "deleted".

The "NMD" characters might be combined in any order. The lowercase characters ("nmd") might be used to exclude values (i.e. status -c n means "show everything except new").

If --filter should also support workpace statuses, then possible values are:

  1. C - changed checksum
  2. A - always changed.
  3. N - not in cache
  4. M - modified
  5. D - deleted
  6. X - not in cache (not sure about this one).
  7. U - update available.

Once again, characters might be combined in any order and lowercase characters might be used to exclude values.

There is on issue with proposed values: same characters have different meaning for local and remote. e.g. N means "new" for remote and "not in cache" for workspace. The possible solutions I see:

  1. Don't bother with it at all and implement values proposed above.
  2. Propose better values.
  3. Support filter for remote/workspace only.

BTW the status message format is different in workspace and remote modes. In local mode message is more hierarchical and shows information about stages which is not the case for remote mode. Here is little example to show difference in remote in local modes:

$ dvc status -c
        deleted:            foo                                                                                                                                                                                    
    deleted:            foobar
    deleted:            bar
$ dvc status
foo.dvc:                                                              
    changed outs:
        not in cache:       foo
join_foobar:
    changed deps:
        deleted:            bar
        deleted:            foo
    changed outs:
        not in cache:       foobar
bar.dvc:
    changed outs:
        not in cache:       bar

Not sure it's a big deal because I've it's only bothered me when I've started to think about --fitler option.

@nik123 To be honest, this feature seems pretty synthetic and only useful for automation. Feels like we are inventing some special filter syntax for no good reason. Have you considered --show-json instead?

@efiop, thanks, I'll try show-json out. It'll probably solve my use case.

Hi @nik123 :) Have you had a chance to try it out?

@efiop , yes, I did. Running dvc status -c --show-json and parsing the output afterwards works just fine for me. I suppose issue might be closed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ghost picture ghost  路  3Comments

anotherbugmaster picture anotherbugmaster  路  3Comments

mdscruggs picture mdscruggs  路  3Comments

mfrata picture mfrata  路  3Comments

robguinness picture robguinness  路  3Comments