#!/bin/bash
rm -rf repo remote_repo storage git_repo
mkdir repo remote_repo storage git_repo
pushd remote_repo
git init >> /dev/null && dvc init -q
set -x
set -e
echo data >> data
# dvc remote add -d str ../storage
dvc remote add -d str /full/path/to/storage
dvc add data
git add .dvc/config data.dvc .gitignore
git commit -m "add data"
dvc push
rm data
rm -rf .dvc/cache
popd
pushd repo
git init >> /dev/null && dvc init -q
dvc get ../remote_repo data
Will work just fine, but when we will use relpath to point to storage (as in commented line)
we will get warnings that cache files do not exist and data
will not be created.
It is happening, because, in external_repo
we clone url to /tmp
and therefore, relative path to cache cannot be resolved properly.
The solution could be to make ExternalRepo
acknowledge local remotes with a relative path and make it resolve the paths appropriately. We do something similar already in ExternalRepo._set_upstream
when automatically setting up the remote if none is present.
Hello, I will take a look into this /cc @efiop
@tizoc Sounds great! Please let us know if you'll have any questions :)
Will do!
So far what I have found out is that (unlike git) dvc is making non-absolute paths in the config relative to the .dvc
directory, not the root of the repository:
$ dvc remote add -d str ../storage
Setting 'str' as a default remote.
$ cat .dvc/config
[core]
remote = str
['remote "str"']
url = ../../storage
git:
$ git remote add str ../storage
$ cat .git/config
[remote "str"]
url = ../storage
fetch = +refs/heads/*:refs/remotes/str/*
If I manually change the path to be relative to the root of the repository the dvc get ../remote_repo data
command works.
So, before going forward, my question is: Is this (paths relative to the .dvc
directory instead of the root) the expected behavior?
@tizoc We actually transform it relative to the config file location, which by default is .dvc/config
. Git does indeed treat it literally, i don't think it will work after that though, unless you are in repo root.
So yes, it is expected behaviour. In get, when we clone external repo, we need to resolve that remote path relative to the _original_ location of that repo. Please see _set_upstream in dvc/external_repo.py
as noted above, it does a similar thing but for cache.
Got it! after stepping the code with the debugger I see the issue now, this is what the url for the remote entry looks like for the temporary cloned repo:
'url': '/private/var/folders/sy/4w247kq176z5n2k37nn3lblw0000gn/T/tmp5_g92ir8dvc-erepo/.dvc/../../storage'
Thanks for the pointer.
Fixed in https://github.com/iterative/dvc/pull/3378
Still have to write a func test, and probably split that function into two.
This should be solved now that #3378 was merged.
Most helpful comment
Hello, I will take a look into this /cc @efiop