UPDATE: Jump to https://github.com/iterative/dvc/issues/4191#issuecomment-657106691
Looks like dvc status
is recursive by default now (maybe always was) from some testing I just did:
位 dvc status
foo.dvc:
changed outs:
modified: foo
data\raw.dvc:
changed outs:
modified: data\raw
位 dvc status -R data/
data\raw.dvc:
changed outs:
modified: data\raw
targets
of the command, like all other commands that take targets (I think)?status
, but only in remote mode (-r
or -c
options), which is kind of confusing (and complicates the docs).DVC version: 1.1.2
Python version: 3.7.5
Platform: Windows-10-10.0.18362-SP0
Binary: True
Package: exe
Supported remotes: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Cache: reflink - not supported, hardlink - supported, symlink - not supported
Filesystem type (cache directory): ('NTFS', 'C:\\')
Repo: dvc, git
Filesystem type (workspace): ('NTFS', 'C:\\')
@jorgeorpinel Non -c status just doesn't support granularity right now.
Closing in favor of https://github.com/iterative/dvc/issues/2180
To clarify:
Why not just support dirs AND FILES as targets of the command, like all other commands that take targets (I think)?
It does support files as targets, but not files(or subdirs) in tracked directories (this is what -c/push/pull/etc support).
It does support files as targets
@efiop so is this a bug? 馃憞
位 ls foo*
foo foo.dvc
位 dvc status foo
ERROR: failed to obtain data status - 'dvc.yaml' does not exist.
@jorgeorpinel Yes, looks like a bug. Reopening. Thanks!
OK. Note on docs:
Never mind. I'll make a separate Specific targets example now.
Another related question: dvc repro
also has targets
but only accepts stage and .dvc file names, right? (Otherwise, -R
would also seem obsolete there.) If so, should it accept file/dir names (and support granularity)? Prob not.
@jorgeorpinel -R
is not obsolete, it searches recursively for dvc files. But when you specify target explicitly, it finds 1 dvc file that it belongs too, so it is not the same.
OK thanks. So dvc repro
also supports tracked files and thus granularity already? Will double check and include in #1384 then.
What about run, (un)freeze, remove, unprotect, update, and metrics/plots show? Will ask in the appropriate issue...
@jorgeorpinel tracked files and granularity is not the same thing. When we were talking about push/pull we were talking about being able to pull specific file (or subdir) in a tracked dataset (dvc add data_dir
). In case of repro we support addressing stages by outputs, which is a different thing.
Got it. So I think we need to clarify when file/dir targets support granularity and when they don't, after all. I'll add a note in all the sync-related ones (in #1384) both in the description and in Specific target examples.
It does support files as targets
@efiop so is this a bug? 馃憞
位 ls foo* foo foo.dvc 位 dvc status foo ERROR: failed to obtain data status - 'dvc.yaml' does not exist.
@jorgeorpinel , @efiop
Supporting outputs as targets is a feature, not a bug I think?
Targets in dvc status
are stages just the same as what in dvc repo
. But the error message ERROR: failed to obtain data status - 'dvc.yaml' does not exist.
is a bug. In a previous version (0.94.1), it used to be
So the solution is either to support outputs as targets or to improve the message.
@karajan1001 Adjusted the labels, thanks! :slightly_smiling_face:
Since 1.0 we've changed the defaults, hence why it looks for dvc.yaml
first. targets
are shared by status
and status -c
, hence the confusion.
@efiop
In a local status mode, we use Repo.collect
to collect stages
https://github.com/iterative/dvc/blob/32b5b33329ef71586ae47b1b879bf90f79edab2f/dvc/repo/status.py#L28-L33
While in a cloud status mode we are using Repo.collect_granular
https://github.com/iterative/dvc/blob/32b5b33329ef71586ae47b1b879bf90f79edab2f/dvc/repo/__init__.py#L378-L382
https://github.com/iterative/dvc/blob/32b5b33329ef71586ae47b1b879bf90f79edab2f/dvc/repo/__init__.py#L340-L352
https://github.com/iterative/dvc/blob/32b5b33329ef71586ae47b1b879bf90f79edab2f/dvc/repo/status.py#L79-L89
https://github.com/iterative/dvc/blob/32b5b33329ef71586ae47b1b879bf90f79edab2f/dvc/repo/status.py#L40-L50
And if the one stage and one output have the same name, the stage would win. This will prevent the users from selecting those outputs which have the same name with stages. (Before version 1.0, stages name will always have a .dvc
suffix which prevents this problem)
https://github.com/iterative/dvc/blob/32b5b33329ef71586ae47b1b879bf90f79edab2f/dvc/repo/__init__.py#L302-L304
Wow, had been considered before. stage
is preferred to output
@karajan1001, there's a workaround: dvc status ./<filename>
for now. :slightly_smiling_face:
But, clearly, it's not implemented for status
, only for -c
.
And, we don't recommend to have a stage name same as outputs.
Most helpful comment
And, we don't recommend to have a stage name same as outputs.