Dvc: remote with relpath breaks get

Created on 8 Nov 2019  路  8Comments  路  Source: iterative/dvc

#!/bin/bash

rm -rf repo remote_repo storage git_repo
mkdir repo remote_repo storage git_repo

pushd remote_repo
git init >> /dev/null && dvc init -q

set -x
set -e
echo data >> data
# dvc remote add -d str ../storage
dvc remote add -d str /full/path/to/storage
dvc add data
git add .dvc/config data.dvc .gitignore
git commit -m "add data"
dvc push
rm data
rm -rf .dvc/cache

popd
pushd repo

git init >> /dev/null && dvc init -q
dvc get ../remote_repo data

Will work just fine, but when we will use relpath to point to storage (as in commented line)
we will get warnings that cache files do not exist and data will not be created.

It is happening, because, in external_repo we clone url to /tmp and therefore, relative path to cache cannot be resolved properly.

enhancement p2-medium ui

Most helpful comment

Hello, I will take a look into this /cc @efiop

All 8 comments

The solution could be to make ExternalRepo acknowledge local remotes with a relative path and make it resolve the paths appropriately. We do something similar already in ExternalRepo._set_upstream when automatically setting up the remote if none is present.

Hello, I will take a look into this /cc @efiop

@tizoc Sounds great! Please let us know if you'll have any questions :)

Will do!

So far what I have found out is that (unlike git) dvc is making non-absolute paths in the config relative to the .dvc directory, not the root of the repository:

$ dvc remote add -d str ../storage
Setting 'str' as a default remote.
$ cat .dvc/config
[core]
    remote = str
['remote "str"']
    url = ../../storage

git:

$ git remote add str ../storage
$ cat .git/config 
[remote "str"]
        url = ../storage
        fetch = +refs/heads/*:refs/remotes/str/*

If I manually change the path to be relative to the root of the repository the dvc get ../remote_repo data command works.

So, before going forward, my question is: Is this (paths relative to the .dvc directory instead of the root) the expected behavior?

@tizoc We actually transform it relative to the config file location, which by default is .dvc/config. Git does indeed treat it literally, i don't think it will work after that though, unless you are in repo root.

So yes, it is expected behaviour. In get, when we clone external repo, we need to resolve that remote path relative to the _original_ location of that repo. Please see _set_upstream in dvc/external_repo.py as noted above, it does a similar thing but for cache.

Got it! after stepping the code with the debugger I see the issue now, this is what the url for the remote entry looks like for the temporary cloned repo:

'url': '/private/var/folders/sy/4w247kq176z5n2k37nn3lblw0000gn/T/tmp5_g92ir8dvc-erepo/.dvc/../../storage'

Thanks for the pointer.

Fixed in https://github.com/iterative/dvc/pull/3378

Still have to write a func test, and probably split that function into two.

This should be solved now that #3378 was merged.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

TezRomacH picture TezRomacH  路  3Comments

jorgeorpinel picture jorgeorpinel  路  3Comments

GildedHonour picture GildedHonour  路  3Comments

dmpetrov picture dmpetrov  路  3Comments

mfrata picture mfrata  路  3Comments