Dvc: status: support outputs as targets [qa]

Created on 10 Jul 2020  路  18Comments  路  Source: iterative/dvc

Bug Report

UPDATE: Jump to https://github.com/iterative/dvc/issues/4191#issuecomment-657106691


Looks like dvc status is recursive by default now (maybe always was) from some testing I just did:

位 dvc status
foo.dvc:
        changed outs:
                modified:           foo
data\raw.dvc:
        changed outs:
                modified:           data\raw

位 dvc status -R data/
data\raw.dvc:
        changed outs:
                modified:           data\raw
  1. So it's really just useful to limit the status to specific directories (not really about recursion).
  2. Why not just support dirs AND FILES as targets of the command, like all other commands that take targets (I think)?
    > Relater to https://github.com/iterative/dvc.org/pull/1384#issuecomment-656878839
  3. In fact this is already supported by status, but only in remote mode (-r or -c options), which is kind of confusing (and complicates the docs).

Please provide information about your setup

DVC version: 1.1.2
Python version: 3.7.5
Platform: Windows-10-10.0.18362-SP0
Binary: True
Package: exe
Supported remotes: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Cache: reflink - not supported, hardlink - supported, symlink - not supported
Filesystem type (cache directory): ('NTFS', 'C:\\')
Repo: dvc, git
Filesystem type (workspace): ('NTFS', 'C:\\')
feature request p2-medium

Most helpful comment

And, we don't recommend to have a stage name same as outputs.

All 18 comments

@jorgeorpinel Non -c status just doesn't support granularity right now.

To clarify:

Why not just support dirs AND FILES as targets of the command, like all other commands that take targets (I think)?

It does support files as targets, but not files(or subdirs) in tracked directories (this is what -c/push/pull/etc support).

It does support files as targets

@efiop so is this a bug? 馃憞

位 ls foo*
foo  foo.dvc
位 dvc status foo
ERROR: failed to obtain data status - 'dvc.yaml' does not exist.

@jorgeorpinel Yes, looks like a bug. Reopening. Thanks!

OK. Note on docs:

  • Move granularity note from Remote comparisons example to Simple usage when this is fixed.

Never mind. I'll make a separate Specific targets example now.

Another related question: dvc repro also has targets but only accepts stage and .dvc file names, right? (Otherwise, -R would also seem obsolete there.) If so, should it accept file/dir names (and support granularity)? Prob not.

@jorgeorpinel -R is not obsolete, it searches recursively for dvc files. But when you specify target explicitly, it finds 1 dvc file that it belongs too, so it is not the same.

OK thanks. So dvc repro also supports tracked files and thus granularity already? Will double check and include in #1384 then.

What about run, (un)freeze, remove, unprotect, update, and metrics/plots show? Will ask in the appropriate issue...

@jorgeorpinel tracked files and granularity is not the same thing. When we were talking about push/pull we were talking about being able to pull specific file (or subdir) in a tracked dataset (dvc add data_dir). In case of repro we support addressing stages by outputs, which is a different thing.

Got it. So I think we need to clarify when file/dir targets support granularity and when they don't, after all. I'll add a note in all the sync-related ones (in #1384) both in the description and in Specific target examples.

It does support files as targets

@efiop so is this a bug? 馃憞

位 ls foo*
foo  foo.dvc
位 dvc status foo
ERROR: failed to obtain data status - 'dvc.yaml' does not exist.

@jorgeorpinel , @efiop
Supporting outputs as targets is a feature, not a bug I think?

image

Targets in dvc status are stages just the same as what in dvc repo. But the error message ERROR: failed to obtain data status - 'dvc.yaml' does not exist. is a bug. In a previous version (0.94.1), it used to be
image

So the solution is either to support outputs as targets or to improve the message.

@karajan1001 Adjusted the labels, thanks! :slightly_smiling_face:

Since 1.0 we've changed the defaults, hence why it looks for dvc.yaml first. targets are shared by status and status -c, hence the confusion.

@efiop
In a local status mode, we use Repo.collect to collect stages
https://github.com/iterative/dvc/blob/32b5b33329ef71586ae47b1b879bf90f79edab2f/dvc/repo/status.py#L28-L33

While in a cloud status mode we are using Repo.collect_granular

https://github.com/iterative/dvc/blob/32b5b33329ef71586ae47b1b879bf90f79edab2f/dvc/repo/__init__.py#L378-L382
https://github.com/iterative/dvc/blob/32b5b33329ef71586ae47b1b879bf90f79edab2f/dvc/repo/__init__.py#L340-L352
https://github.com/iterative/dvc/blob/32b5b33329ef71586ae47b1b879bf90f79edab2f/dvc/repo/status.py#L79-L89
https://github.com/iterative/dvc/blob/32b5b33329ef71586ae47b1b879bf90f79edab2f/dvc/repo/status.py#L40-L50

And if the one stage and one output have the same name, the stage would win. This will prevent the users from selecting those outputs which have the same name with stages. (Before version 1.0, stages name will always have a .dvc suffix which prevents this problem)

image

@karajan1001, there's a workaround: dvc status ./<filename> for now. :slightly_smiling_face:

But, clearly, it's not implemented for status, only for -c.

And, we don't recommend to have a stage name same as outputs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

analystanand picture analystanand  路  3Comments

prihoda picture prihoda  路  3Comments

tc-ying picture tc-ying  路  3Comments

ghost picture ghost  路  3Comments

robguinness picture robguinness  路  3Comments