See https://github.com/iterative/dvc/pull/2889#discussion_r357950906 for context
Context: we do have status implemented for the DependencyREPO, but it is not tested and also doesn't really run because the stage is locked.
+1 for me as well; when my Data Scientists clone a project repo which uses a data-registry repo, running dvc status on the project repo returns the following:
$ dvc status
WARNING: DVC-file 'data.dvc' is locked. Its dependencies are not going to be shown in the status output.
data.dvc:
changed outs:
not in cache: data
My data-registry repo:
.
โโโ .dvc
โย ย โโโ .gitignore
โ โโโ config
โโโ README.md
โโโ projectA
โย ย โโโ .gitignore
โ โโโ data.dvc
โโโ projectB
โโโ .gitignore
โโโ data.dvc
Update projectA to use data managed by data-registry repo:
$ git clone [email protected]:path/to/projectA.git
$ cd projectA
$ dvc init
$ dvc import [email protected]:<the-data-repo.git> projectA/data
$ git add data.dvc
$ git commit -m "Using DVC!!"
$ git push
Another user clones the (updated) projectA repo, and checks dvc:
$ git clone [email protected]:path/to/projectA.git
$ cd projectA
$ dvc status
WARNING: DVC-file 'data.dvc' is locked. Its dependencies are not going to be shown in the status output.
data.dvc:
changed outs:
not in cache: data
It's less than optimal UX, and I know I will get emails over this ; )
repro script:
mkdir remote_repo repo
set -e
set -x
maindir=$(pwd)
pushd remote_repo
git init -q && dvc init -q
echo data >> data
dvc add data
git add -A
git commit -am "add data"
popd
pushd repo
git init -q && dvc init -q
dvc import $maindir/remote_repo data
dvc status
[EDIT]
at the last status we will get following output:
+ dvc status
WARNING: DVC-file 'data.dvc' is locked. Its dependencies are not going to
be shown in the status output.
Data and pipelines are up to date.
Ideally, we should get information if import-ed stage is up-to-date with current state of the revision it has been import-ed from. @dmpetrov correct me if I am wrong, but I guess that if no revision has been provided, it should be compared to current master, right?
Need to research how git does it, maybe it has some caching mechanism inside. It would be nice to do a similar thing to make dvc status instant even for imported files.
@pared thank you for the repro script! It would be even more helpful to add important outputs into it (and probably some comments with problem/expected outputs) :)
@dmpetrov sorry for that. I provided some context, please check it out. :)
I'm picking this one up!
Most helpful comment
Overview
+1 for me as well; when my Data Scientists clone a project repo which uses a data-registry repo, running
dvc statuson the project repo returns the following:Repro
My data-registry repo:
Update
projectAto use data managed bydata-registryrepo:Another user clones the (updated)
projectArepo, and checks dvc:It's less than optimal UX, and I know I will get emails over this ; )